Sonatus

Staff AI Engineer, Inference & Optimization

Sonatus, Sunnyvale, California, United States, 94087

Overview

Staff AI Engineer, Inference & Optimization at Sonatus. Join a high-performing team redefining what cars can do in the era of Software-Defined Vehicles (SDV). Sonatus is driving the transformation to AI-enabled software-defined vehicles. Our technology is already in production across more than 5 million vehicles and is expanding rapidly. Headquartered in Sunnyvale, CA, with 250+ employees worldwide, we combine the agility of a fast-growing company with the scale of an established partner. We are solving some of the most interesting challenges in the industry and shaping the future of mobility. Responsibilities

Design, build, and maintain robust pipelines and runtime environments for deploying and serving machine learning models at the Edge, ensuring high availability, low latency, and efficient resource utilization for inference at scale. Collaborate with researchers and hardware engineers to optimize models for performance, latency, and power consumption on specific hardware (GPUs, TPUs, NPUs, FPGAs) with a focus on inference optimization techniques such as quantization, pruning, and knowledge distillation. Use AI compilers and specialized software stacks (e.g., TensorRT, OpenVINO, TVM) to accelerate model execution and optimize for target hardware. Design, build, and maintain MLOps pipelines for deploying models to edge devices with emphasis on performance and efficiency constraints. Implement and maintain monitoring and alerting systems to track model performance, data drift, and overall model health in production. Work with cloud platforms and on-device environments to provision and manage infrastructure for scalable and reliable model serving. Identify and resolve issues related to model performance, deployment failures, and data discrepancies, focusing on inference bottlenecks. Collaborate with Machine Learning Engineers, Software Engineers, and Product Managers to bring models from design to high-performance production systems. Qualifications

Minimum 7 years of work experience in MLOps or a similar role with a strong focus on high-performance machine learning systems. Proven experience with inference optimization techniques (quantization, pruning, model distillation). Hands-on experience with hardware acceleration for ML, including GPUs, TPUs, NPUs and related software ecosystems. Experience with AI compilers and runtimes like TensorRT, OpenVINO, and TVM. Experience deploying and managing ML models on edge devices (e.g., NVIDIA Jetson, Raspberry Pi, NXP, Renesas). Strong experience in designing and building distributed systems, with familiarity of gRPC, MQTT, and efficient data handling techniques. Hands-on experience with ML frameworks (PyTorch, TensorFlow, TFLite, ONNX). Proficiency in Python and C++. Solid understanding of ML concepts, lifecycle, and challenges of deploying models at scale. Proficiency with containers (Docker, Kubernetes) and cloud platforms (AWS, Azure). CI/CD principles and tools applied to ML workflows. Bachelor’s or Master’s degree in Computer Science, Electrical Engineering, or related field. Benefits

Stock option plan Health care plan (Medical, Dental & Vision) Retirement plan (401k, IRA) Life Insurance (Basic, Voluntary & AD&D) Unlimited paid time off (Vacation, Sick & Public Holidays) Family leave (Maternity, Paternity) Flexible work arrangements Free food & snacks in office Salary

The posted salary range is a general guideline. Pay is based on factors such as scope, qualifications, location, and market rates. Pay range for this role: $197,500—$260,000 USD. Other information

To all recruitment agencies: Sonatus does not accept unsolicited agency resumes. Sonatus is not responsible for fees from unsolicited activities.

#J-18808-Ljbffr