HealthVerity
Senior Data Engineer
As a senior data engineer on the data platform team, you will be supporting and enhancing the platform that supports HealthVerity's Petabyte-scale core data asset. You will work closely with other engineers, data scientists, and business leaders to ensure that our data platform is available, secure, and reliable. You will use your strong engineering and product mindset to understand business needs and develop scalable engineering solutions that support HealthVerity's product roadmap and vision while continuously looking for opportunities to simplify, automate tasks, and build reusable components. Engineer efficient, adaptable and scalable data pipelines to process structured and unstructured data Develop and maintain data pipelines to efficiently process and analyze large amounts of streaming data Collaborate with other data engineers to maintain a cohesive and standardized data infrastructure Work closely with the software engineering team to integrate data pipelines into the overall platform architecture Collaborate with cross-functional teams including software engineers, data scientists, product managers, and analysts to understand data needs and deliver valuable platform enhancements that support the overall HealthVerity vision and roadmap. Identify and implement solutions to optimize data storage, retrieval, and processing Continuously evaluate and improve data engineering processes and systems to increase efficiency and scalability Stay up-to-date with emerging technologies and industry trends in data engineering Ensure data security and compliance with privacy regulations Troubleshoot and resolve data-related issues in a timely manner Leverage large-scale distributed computing and serverless architecture including Spark, AWS Lambda, etc. to develop pipelines for transforming data Partner with the product teams to understand product goals and provide data that enables us to respond to customer and regulatory data requests Monitor data quality and proactively identify and resolve data issues Our team leverages the following technologies in our day-to-day development process: Github (includes CI/CD Flow-GHA), Python, Postgres, AWS Cloud-native technologies (CDK, Lambda, S3, EMR, ECS, SQS, Eventbridge, AuroraDB, cloudwatch and more), Spark, Databricks (SQL, Delta Live Tables, Unity Catalog, Audit Logs, Workflow), Docker/Kubernetes, Airflow, Hive SQL, Infrastructure as Code (IaC) tools, such as Terraform, YAML, and Helm Charts Lead the design and implementation of scalable data solutions Proactively identify and address data quality and compliance issues Share knowledge across teams Contribute to strategic decisions regarding data architecture and tooling You are proficient in at least one primary language (e.g., Java, Scala, Python) and Advanced SQL (any variant) You have experience with Databricks pipeline automation, AWS EMR, AWS S3 service, Snowflake, Spark, Docker You have 8+ years of industry experience and proficiency in building distributed data pipelines for both batch and real-time (experience with Spark, Hive, Iceberg, Kafka, Snowflake is helpful, but not strictly required) You have a product mindset to understand business needs and develop scalable engineering solutions You are always looking for opportunities to simplify, automate tasks, and build reusable components across multiple use cases and teams You have strong communication skills to collaborate with cross-functional partners and drive projects. You are curious and eager to work across a variety of engineering specialties (i.e., Data Science, Data Engineering, and Machine Learning to name a few) You have a strong knowledge of Databricks features and functionalities, such as Unity Catalog, Audit Logs, Databricks SQL and Delta Live Tables Experience with CI/CD pipelines and DataOps You have an eye for detail and like to spark joy amongst your partners with well-documented high-quality data products that are modeled and easy to understand You are able to successfully lead large, complex systems design and implementation challenges independently Experience using Infrastructure as Code (IaC) tools, such as Terraform, YAML, and Helm Charts Base salary for the role is commensurate with experience and can range between $120,000 - 200,000 + annual bonus opportunity.
As a senior data engineer on the data platform team, you will be supporting and enhancing the platform that supports HealthVerity's Petabyte-scale core data asset. You will work closely with other engineers, data scientists, and business leaders to ensure that our data platform is available, secure, and reliable. You will use your strong engineering and product mindset to understand business needs and develop scalable engineering solutions that support HealthVerity's product roadmap and vision while continuously looking for opportunities to simplify, automate tasks, and build reusable components. Engineer efficient, adaptable and scalable data pipelines to process structured and unstructured data Develop and maintain data pipelines to efficiently process and analyze large amounts of streaming data Collaborate with other data engineers to maintain a cohesive and standardized data infrastructure Work closely with the software engineering team to integrate data pipelines into the overall platform architecture Collaborate with cross-functional teams including software engineers, data scientists, product managers, and analysts to understand data needs and deliver valuable platform enhancements that support the overall HealthVerity vision and roadmap. Identify and implement solutions to optimize data storage, retrieval, and processing Continuously evaluate and improve data engineering processes and systems to increase efficiency and scalability Stay up-to-date with emerging technologies and industry trends in data engineering Ensure data security and compliance with privacy regulations Troubleshoot and resolve data-related issues in a timely manner Leverage large-scale distributed computing and serverless architecture including Spark, AWS Lambda, etc. to develop pipelines for transforming data Partner with the product teams to understand product goals and provide data that enables us to respond to customer and regulatory data requests Monitor data quality and proactively identify and resolve data issues Our team leverages the following technologies in our day-to-day development process: Github (includes CI/CD Flow-GHA), Python, Postgres, AWS Cloud-native technologies (CDK, Lambda, S3, EMR, ECS, SQS, Eventbridge, AuroraDB, cloudwatch and more), Spark, Databricks (SQL, Delta Live Tables, Unity Catalog, Audit Logs, Workflow), Docker/Kubernetes, Airflow, Hive SQL, Infrastructure as Code (IaC) tools, such as Terraform, YAML, and Helm Charts Lead the design and implementation of scalable data solutions Proactively identify and address data quality and compliance issues Share knowledge across teams Contribute to strategic decisions regarding data architecture and tooling You are proficient in at least one primary language (e.g., Java, Scala, Python) and Advanced SQL (any variant) You have experience with Databricks pipeline automation, AWS EMR, AWS S3 service, Snowflake, Spark, Docker You have 8+ years of industry experience and proficiency in building distributed data pipelines for both batch and real-time (experience with Spark, Hive, Iceberg, Kafka, Snowflake is helpful, but not strictly required) You have a product mindset to understand business needs and develop scalable engineering solutions You are always looking for opportunities to simplify, automate tasks, and build reusable components across multiple use cases and teams You have strong communication skills to collaborate with cross-functional partners and drive projects. You are curious and eager to work across a variety of engineering specialties (i.e., Data Science, Data Engineering, and Machine Learning to name a few) You have a strong knowledge of Databricks features and functionalities, such as Unity Catalog, Audit Logs, Databricks SQL and Delta Live Tables Experience with CI/CD pipelines and DataOps You have an eye for detail and like to spark joy amongst your partners with well-documented high-quality data products that are modeled and easy to understand You are able to successfully lead large, complex systems design and implementation challenges independently Experience using Infrastructure as Code (IaC) tools, such as Terraform, YAML, and Helm Charts Base salary for the role is commensurate with experience and can range between $120,000 - 200,000 + annual bonus opportunity.