The Applied Research Laboratory at Penn State University
Data Engineer – Data Architecture for Data Science & Machine Learning
The Applied Research Laboratory at Penn State University, State College, Pennsylvania, United States
Data Engineer – Data Architecture for Data Science & Machine Learning
Senior Data Engineer with deep expertise in database design, optimization, and data access strategies to support data science and machine learning initiatives within the Visualization and Decision Support Division of Penn State ARL.
Application Instructions CURRENT PENN STATE EMPLOYEE (faculty, staff, technical service, or student) – please login to Workday to complete the internal application process.
External applicants
– click “Apply” and complete the application process.
Position Specifications Location: State College, PA or Reston, VA. Estimated salary range: $109,300.00 – $219,600.00. Remote and hybrid work approval is not guaranteed.
Responsibilities
Design and maintain scalable, high-performance database solutions for data science workflows and ML experimentation.
Partner with data scientists to understand data access patterns and develop storage strategies that accelerate analysis and model training.
Serve as the internal subject matter expert on PostgreSQL, including schema design, indexing, partitioning, and query optimization.
Evaluate and integrate alternative database technologies (e.g., MongoDB, Neo4j, Redis, Cassandra) where they provide clear advantages.
Lead efforts to optimize data pipelines for structured and unstructured data used in algorithm development.
Ensure data integrity, security, and governance across storage systems.
Implement monitoring, automation, and performance‑tuning tools for all database environments.
Advise on data lifecycle management to balance accessibility for R&D with efficiency and compliance requirements.
Required Skills / Experience
5+ years of experience in data engineering, database architecture, or related technical roles.
Expert-level proficiency in PostgreSQL (query tuning, schema design, indexing, partitioning, replication).
Strong understanding of data modeling, normalisation vs. denormalisation trade-offs, and query optimisation.
Experience with non-relational databases (MongoDB, Cassandra, Neo4j, Redis, or DynamoDB).
Familiarity with machine learning workflows and data consumption for training, evaluation, and deployment.
Experience with cloud database services (AWS RDS/Aurora, GCP Cloud SQL, Azure Database).
Proficiency in SQL and one or more scripting languages (Python preferred).
Excellent communication and collaboration skills – comfortable working closely with data scientists, ML engineers, and software developers.
Preferred Skills / Experience
Experience architecting hybrid data ecosystems spanning relational, NoSQL, and analytical databases.
Knowledge of data lake, warehouse, and feature store architectures (Snowflake, Redshift, BigQuery, Feast).
Familiarity with ETL/ELT frameworks and data orchestration tools (Airflow, dbt).
Bachelor’s or Master’s degree in Computer Science, Data Engineering, or a related field.
Minimum Education, Work Experience & Required Certifications
Research and Development Engineer – Principal Professional: Bachelor’s Degree – Engineering or Science; 19+ years of relevant experience.
Research and Development Engineer – Advanced Professional: Bachelor’s Degree – Engineering or Science; 5+ years of relevant experience.
Research and Development Engineer – Senior Professional: Bachelor’s Degree – Engineering or Science; 14+ years of relevant experience.
Background Checks / Clearances Employment will require a successful background check and ability to obtain a government security clearance. U.S. citizenship is required. Background investigations and pre-employment drug screening will be conducted as per university policies.
Salary & Benefits Salary range: $109,300.00 – $219,600.00 (may be impacted by geographic differential). Benefits include comprehensive medical, dental, and vision coverage, retirement plans, paid time off, and a 75% tuition discount.
EEO Statement Penn State is an equal opportunity employer and is committed to providing employment opportunities to all qualified applicants without regard to race, color, religion, age, sex, sexual orientation, gender identity, national origin, disability or protected veteran status.
#J-18808-Ljbffr
Application Instructions CURRENT PENN STATE EMPLOYEE (faculty, staff, technical service, or student) – please login to Workday to complete the internal application process.
External applicants
– click “Apply” and complete the application process.
Position Specifications Location: State College, PA or Reston, VA. Estimated salary range: $109,300.00 – $219,600.00. Remote and hybrid work approval is not guaranteed.
Responsibilities
Design and maintain scalable, high-performance database solutions for data science workflows and ML experimentation.
Partner with data scientists to understand data access patterns and develop storage strategies that accelerate analysis and model training.
Serve as the internal subject matter expert on PostgreSQL, including schema design, indexing, partitioning, and query optimization.
Evaluate and integrate alternative database technologies (e.g., MongoDB, Neo4j, Redis, Cassandra) where they provide clear advantages.
Lead efforts to optimize data pipelines for structured and unstructured data used in algorithm development.
Ensure data integrity, security, and governance across storage systems.
Implement monitoring, automation, and performance‑tuning tools for all database environments.
Advise on data lifecycle management to balance accessibility for R&D with efficiency and compliance requirements.
Required Skills / Experience
5+ years of experience in data engineering, database architecture, or related technical roles.
Expert-level proficiency in PostgreSQL (query tuning, schema design, indexing, partitioning, replication).
Strong understanding of data modeling, normalisation vs. denormalisation trade-offs, and query optimisation.
Experience with non-relational databases (MongoDB, Cassandra, Neo4j, Redis, or DynamoDB).
Familiarity with machine learning workflows and data consumption for training, evaluation, and deployment.
Experience with cloud database services (AWS RDS/Aurora, GCP Cloud SQL, Azure Database).
Proficiency in SQL and one or more scripting languages (Python preferred).
Excellent communication and collaboration skills – comfortable working closely with data scientists, ML engineers, and software developers.
Preferred Skills / Experience
Experience architecting hybrid data ecosystems spanning relational, NoSQL, and analytical databases.
Knowledge of data lake, warehouse, and feature store architectures (Snowflake, Redshift, BigQuery, Feast).
Familiarity with ETL/ELT frameworks and data orchestration tools (Airflow, dbt).
Bachelor’s or Master’s degree in Computer Science, Data Engineering, or a related field.
Minimum Education, Work Experience & Required Certifications
Research and Development Engineer – Principal Professional: Bachelor’s Degree – Engineering or Science; 19+ years of relevant experience.
Research and Development Engineer – Advanced Professional: Bachelor’s Degree – Engineering or Science; 5+ years of relevant experience.
Research and Development Engineer – Senior Professional: Bachelor’s Degree – Engineering or Science; 14+ years of relevant experience.
Background Checks / Clearances Employment will require a successful background check and ability to obtain a government security clearance. U.S. citizenship is required. Background investigations and pre-employment drug screening will be conducted as per university policies.
Salary & Benefits Salary range: $109,300.00 – $219,600.00 (may be impacted by geographic differential). Benefits include comprehensive medical, dental, and vision coverage, retirement plans, paid time off, and a 75% tuition discount.
EEO Statement Penn State is an equal opportunity employer and is committed to providing employment opportunities to all qualified applicants without regard to race, color, religion, age, sex, sexual orientation, gender identity, national origin, disability or protected veteran status.
#J-18808-Ljbffr