PathAI
Overview
Contract duration: minimum 6 months. Remote work from anywhere within the U.S. Opportunity to work with cutting-edge technology in machine learning and data infrastructure. Collaborative environment with cross-functional teams. Chance to significantly impact patient outcomes through improved data management. Responsibilities Analyze and optimize storage strategies for machine learning experiment data and metadata. Design and implement intelligent retention and expiration for large-scale datasets. Modernize and refactor ETL pipelines to enhance scalability and maintenance. Build and enhance database-backed applications that support machine learning research and production analytics. Collaborate with machine learning engineers, site reliability engineers, and platform teams.
Qualifications
Proven expertise with relational databases (e.g., Postgres, Amazon RDS, Aurora), including schema design, query optimization, and performance tuning. Strong experience with ETL development and cloud data warehousing (e.g., Snowflake, Redshift). Familiarity with big data deployments and scalable architectures such as Spark and Hive. Experience with Apache Airflow for systems automation. Proficiency in Python for application development, data processing, and automation.
Preferred Qualifications
Background in machine learning data pipelines or analytics-heavy environments. Knowledge of data governance, retention policies, or cost-optimization strategies in cloud environments.
#J-18808-Ljbffr
Contract duration: minimum 6 months. Remote work from anywhere within the U.S. Opportunity to work with cutting-edge technology in machine learning and data infrastructure. Collaborative environment with cross-functional teams. Chance to significantly impact patient outcomes through improved data management. Responsibilities Analyze and optimize storage strategies for machine learning experiment data and metadata. Design and implement intelligent retention and expiration for large-scale datasets. Modernize and refactor ETL pipelines to enhance scalability and maintenance. Build and enhance database-backed applications that support machine learning research and production analytics. Collaborate with machine learning engineers, site reliability engineers, and platform teams.
Qualifications
Proven expertise with relational databases (e.g., Postgres, Amazon RDS, Aurora), including schema design, query optimization, and performance tuning. Strong experience with ETL development and cloud data warehousing (e.g., Snowflake, Redshift). Familiarity with big data deployments and scalable architectures such as Spark and Hive. Experience with Apache Airflow for systems automation. Proficiency in Python for application development, data processing, and automation.
Preferred Qualifications
Background in machine learning data pipelines or analytics-heavy environments. Knowledge of data governance, retention policies, or cost-optimization strategies in cloud environments.
#J-18808-Ljbffr