Crustdata (YC F24)
Senior Data Platform Engineer
Crustdata (YC F24), San Francisco, California, United States, 94199
This range is provided by Crustdata (YC F24). Your actual pay will be based on your skills and experience — talk with your recruiter to learn more.
Base pay range
$140,000.00/yr - $200,000.00/yr The Role We are looking for a foundational member of our engineering team: a highly motivated Software Engineer to own the design, creation, and evolution of our data platform. You will be part of the team that owns the data ingestion and management infrastructure that powers Crustdata’s capabilities. If you are passionate about building robust, scalable data systems and want to see your work directly influence customers, this is the role for you. What You'll Do
Architect & Build: Design, build, and maintain our core data infrastructure, including our data warehouse and data lake, using modern cloud technologies (AWS, GCP, or Azure). Pipeline Development: Develop and scale robust, fault-tolerant data pipelines (ETL/ELT) to ingest and process massive volumes of structured and unstructured data from diverse sources. Enable Data Science & ML: Create the foundational platform to support our data scientists and ML engineers. This includes building systems for feature engineering, model training, and deploying ML models into production. Orchestration at Scale: Implement and manage workflow orchestration for hundreds of daily data jobs, ensuring reliability, monitorability, and efficiency using tools like Airflow, Dagster, or Prefect. Real-time Infrastructure: Build and manage real-time data streaming pipelines using technologies like Kafka or Flink to power live dashboards and time-sensitive product features. Data Quality & Governance: Champion data quality and reliability. Implement frameworks for data validation, testing, and monitoring to ensure our data is accurate and trustworthy. Who You Are
Experience: You have 3+ years of professional software engineering experience, with a significant focus on data engineering or building backend systems at scale. Strong Coder: You possess strong programming skills in Python or another modern language (e.g., Java, Go). Big Data Expertise: You have hands-on experience with modern big data technologies such as Spark, Flink, or Dask. Pipeline Orchestration: You have practical experience with workflow management tools like Temporal, Airflow, Dagster, or Prefect. Problem Solver: You are a pragmatic problem-solver who can navigate ambiguity, manage complexity, and take ownership of projects from inception to completion. Startup Mentality: You are excited to work in a fast-paced, collaborative environment and wear multiple hats. Nice to Haves
Experience with real-time streaming technologies (Kafka, Pulsar, Kinesis). Familiarity with containerization and orchestration (Docker, Kubernetes). Knowledge of modern data warehousing and lakehouse architectures (e.g., Delta Lake, Iceberg). Seniority level
Mid-Senior level Employment type
Full-time Job function
Engineering and Information Technology Industries: Software Development
#J-18808-Ljbffr
$140,000.00/yr - $200,000.00/yr The Role We are looking for a foundational member of our engineering team: a highly motivated Software Engineer to own the design, creation, and evolution of our data platform. You will be part of the team that owns the data ingestion and management infrastructure that powers Crustdata’s capabilities. If you are passionate about building robust, scalable data systems and want to see your work directly influence customers, this is the role for you. What You'll Do
Architect & Build: Design, build, and maintain our core data infrastructure, including our data warehouse and data lake, using modern cloud technologies (AWS, GCP, or Azure). Pipeline Development: Develop and scale robust, fault-tolerant data pipelines (ETL/ELT) to ingest and process massive volumes of structured and unstructured data from diverse sources. Enable Data Science & ML: Create the foundational platform to support our data scientists and ML engineers. This includes building systems for feature engineering, model training, and deploying ML models into production. Orchestration at Scale: Implement and manage workflow orchestration for hundreds of daily data jobs, ensuring reliability, monitorability, and efficiency using tools like Airflow, Dagster, or Prefect. Real-time Infrastructure: Build and manage real-time data streaming pipelines using technologies like Kafka or Flink to power live dashboards and time-sensitive product features. Data Quality & Governance: Champion data quality and reliability. Implement frameworks for data validation, testing, and monitoring to ensure our data is accurate and trustworthy. Who You Are
Experience: You have 3+ years of professional software engineering experience, with a significant focus on data engineering or building backend systems at scale. Strong Coder: You possess strong programming skills in Python or another modern language (e.g., Java, Go). Big Data Expertise: You have hands-on experience with modern big data technologies such as Spark, Flink, or Dask. Pipeline Orchestration: You have practical experience with workflow management tools like Temporal, Airflow, Dagster, or Prefect. Problem Solver: You are a pragmatic problem-solver who can navigate ambiguity, manage complexity, and take ownership of projects from inception to completion. Startup Mentality: You are excited to work in a fast-paced, collaborative environment and wear multiple hats. Nice to Haves
Experience with real-time streaming technologies (Kafka, Pulsar, Kinesis). Familiarity with containerization and orchestration (Docker, Kubernetes). Knowledge of modern data warehousing and lakehouse architectures (e.g., Delta Lake, Iceberg). Seniority level
Mid-Senior level Employment type
Full-time Job function
Engineering and Information Technology Industries: Software Development
#J-18808-Ljbffr