Logo
Parallel

Senior ML Ops Engineer (Machine Learning Infrastructure)

Parallel, Los Angeles, California, United States, 90079

Save Job

Overview

Parallel Systems is pioneering autonomous battery-electric rail vehicles designed to transform freight transportation by shifting portions of the $900 billion U.S. trucking industry onto rail. Our innovative technology offers cleaner, safer, and more efficient logistics solutions. Join our dynamic team and help shape a smarter, greener future for global freight. Responsibilities

Design and implement robust MLOps solutions, including automated pipelines for data management, model training, deployment and monitoring. Architect, deploy, and manage scalable ML infrastructure for distributed training and inference. Collaborate with ML engineers to gather requirements and develop strategies for data management, model development and deployment. Build and operate cloud-based systems (e.g., AWS, GCP) optimized for ML workloads in R&D, and production environments. Build scalable ML infrastructure to support continuous integration/deployment, experiment management, and governance of models and datasets. Support the automation of model evaluation, selection, and deployment workflows. What Success Looks Like

After 30 Days: Develop a deep understanding of product goals, existing infrastructure, and stakeholder requirements. Conduct technical discovery and propose a preliminary MLOps architecture, evaluating ML tools, cloud services, and workflow strategies with pros/cons clearly outlined. After 60 Days: Deliver a detailed design document outlining the end-to-end ML pipeline, including data ingestion, model training, deployment, and monitoring. Iterate on the design based on feedback and build a PoC for the core ML workflow aligned with the approved architecture. After 90 Days: Deliver core features of the MLOps pipeline and integrate key tools (e.g., MLflow, SageMaker, or Kubeflow). Begin implementing remaining features to support scalable, repeatable workflows for model experimentation and deployment in both R&D and production environments. Basic Requirements

Bachelor’s or higher degree in Computer Science, Machine Learning, or a relevant engineering discipline. 5+ years of experience building large-scale, reliable systems; 2+ years focused on ML infrastructure or MLOps. Proven experience architecting and deploying production-grade ML pipelines and platforms. Strong knowledge of ML lifecycle: data ingestion, model training, evaluation, packaging, and deployment. Hands-on experience with MLOps tools (e.g., MLflow, Kubeflow, SageMaker, Airflow, Metaflow, or similar). Deep understanding of CI/CD practices applied to ML workflows. Proficiency in Python, Git, and system design with solid software engineering fundamentals. Experience with cloud platforms (AWS, GCP, or Azure) and designing ML architectures in those environments. Preferred Qualifications

Experience with deep learning architectures (CNNs, RNNs, Transformers) or computer vision. Hands-on experience with distributed training tools (e.g., PyTorch DDP, Horovod, Ray). Background in real-time ML systems and batch inference, including CPU/GPU-aware orchestration. Previous work in autonomous vehicles, robotics, or other real-time ML-driven systems. Compensation & Equal Opportunity

We are committed to providing fair and transparent compensation in accordance with applicable laws. Salary ranges are listed below and reflect the expected range for new hires in this role, based on factors such as skills, experience, qualifications, and location. Final compensation may vary and will be determined during the interview process. The target hiring range for this position is listed below. Target Salary Range: $150,000—$240,000 USD Parallel Systems is an equal opportunity employer committed to diversity in the workplace. All qualified applicants will receive consideration for employment without regard to any discriminatory factor protected by applicable federal, state or local laws. We work to build an inclusive environment in which all people can come to do their best work. Parallel Systems is committed to the full inclusion of all qualified individuals. As part of this commitment, Parallel Systems will ensure that persons with disabilities are provided reasonable accommodations. If reasonable accommodation is needed to participate in the job application or interview process, to perform essential job functions, and/or to receive other benefits and privileges of employment, please contact your recruiter.

#J-18808-Ljbffr