Greylock Partners
Founding Data Engineer (ML Pipelines)
Greylock Partners, Redwood City, California, United States, 94061
Early‑stage, cybersecurity investment (valued over $100M at Seed) founded by a successful serial entrepreneur is looking to hire a Founding Data Engineer with a strong background supporting full ML pipelines. Bonus points for prior industry experience in cybersecurity.
Summary Supporting machine learning efforts by focusing on building and maintaining the real‑time data pipelines that feed models with reliable, high‑quality information. Your work will center on ingesting, transforming, and organizing massive data streams so that training and inference systems have consistent, accurate inputs. In this role, you will handle the backbone of ML workflows — ensuring scalability, low latency, and data governance. Your success will be measured by how efficiently and reliably data flows through the ecosystem, from raw sources to feature stores and production endpoints.
Key Qualifications
Degree in CS (or a related field) with 4+ years related industry experience in data engineering
Prior experience with a zero-to-one a strong plus
Proven ability to collaborate across different teams and adapt to new fields
Responsibilities
Design and maintain real‑time data pipelines to support large‑scale machine learning workflows, ensuring low‑latency ingestion and high reliability.
Build and optimize data infrastructure for feature extraction, model training, and online inference using modern streaming and orchestration frameworks.
Collaborate with ML researchers and platform engineers to integrate models into production systems and enable continuous data feedback loops.
Implement robust data quality, observability, and governance practices to ensure scalability, compliance, and reproducibility across enterprise environments.
Location: Redwood City, CA
Seniority level: Mid‑Senior level
Employment type: Full‑time
Job function: Engineering and Information Technology
Industries: Software Development, Computers and Electronics Manufacturing, IT Services and IT Consulting
#J-18808-Ljbffr
Summary Supporting machine learning efforts by focusing on building and maintaining the real‑time data pipelines that feed models with reliable, high‑quality information. Your work will center on ingesting, transforming, and organizing massive data streams so that training and inference systems have consistent, accurate inputs. In this role, you will handle the backbone of ML workflows — ensuring scalability, low latency, and data governance. Your success will be measured by how efficiently and reliably data flows through the ecosystem, from raw sources to feature stores and production endpoints.
Key Qualifications
Degree in CS (or a related field) with 4+ years related industry experience in data engineering
Prior experience with a zero-to-one a strong plus
Proven ability to collaborate across different teams and adapt to new fields
Responsibilities
Design and maintain real‑time data pipelines to support large‑scale machine learning workflows, ensuring low‑latency ingestion and high reliability.
Build and optimize data infrastructure for feature extraction, model training, and online inference using modern streaming and orchestration frameworks.
Collaborate with ML researchers and platform engineers to integrate models into production systems and enable continuous data feedback loops.
Implement robust data quality, observability, and governance practices to ensure scalability, compliance, and reproducibility across enterprise environments.
Location: Redwood City, CA
Seniority level: Mid‑Senior level
Employment type: Full‑time
Job function: Engineering and Information Technology
Industries: Software Development, Computers and Electronics Manufacturing, IT Services and IT Consulting
#J-18808-Ljbffr