Apple
Apple Ray Inference Engineer
Cupertino, California, United States The Apple Data Platform (ADP) group builds the data platform that enables the next generation of intelligent experiences on all Apple products and services. ADP empowers Apple engineers to deliver ML-driven products and innovations rapidly and at scale. We are looking for an experienced engineer who can bring their passion for machine learning, infrastructure, big data, and distributed systems to build and serve world class data+ML platforms/products at scale. You will work with many cross functional teams and lead the planning, execution and success of technical projects with the ultimate purpose of improving the ML experience for Apple customers - with a focus on designing, deploying, and optimizing model inference. Are you passionate about building scalable, reliable, maintainable infrastructure and solving data problems at scale? Come join us and be part of the Data Infrastructure journey. Responsibilities
As a member of the Apple Ray team, your responsibilities will include: Designing, implementing, and maintaining distributed systems to build world-class ML platforms/products at scale Experimenting with, deploying, and managing LLMs in a production context Benchmarking and optimizing inference deployments for different workloads, e.g. online vs. batch vs. streaming workloads Diagnosing, fixing, improving, and automating complex issues across the entire stack to ensure maximum uptime and performance Designing and extending services to improve functionality and reliability of the platform Monitoring system performance, optimizing for cost and efficiency, and resolving any issues that arise Building relationships with stakeholders across the organization to better understand internal customer needs and enhance our product better for end users Minimum Qualifications
Required: 5+ years of experience in distributed systems with deep knowledge in computer science fundamentals Experience managing deployments of LLMs at scale Experience with inference runtimes/engines, e.g. ONNXRT, TensorRT, vLLM, sglang Experience with ML Training/Inference profiling and optimization for different workloads and tasks, e.g. online inference, batch inference, streaming inference Experience with profiling ML models for different end use cases, e.g. RAG vs. code completion, etc. Experience with containerization and orchestration technologies, such as Docker and Kubernetes Experience in delivering data and machine learning infrastructure in production environments Experience configuring, deploying and troubleshooting large scale production environments Experience in designing, building, and maintaining scalable, highly available systems that prioritize ease of use Experience with alerting, monitoring and remediation automation in a large scale distributed environment Extensive programming experience in Java, Python or Go Strong collaboration and communication (verbal and written) skills B.S., M.S., or Ph.D. in Computer Science, Computer Engineering, or equivalent practical experience Preferred Qualifications
Preferred: Understanding of the ML lifecycle and state of the art ML Infrastructure technologies Familiarity with CUDA + kernel implementation Experience with inference optimization and fine-tuning techniques (e.g. pruning, distilling, quantization) Experience with deploying + optimizing ML models on heterogenous hardware, e.g. GPUs, TPUs, Inferentia, etc. Experience with GPU and other type of HPC infrastructure Experience with training framework like PyTorch, Tensorflow, JAX Deep understanding of Ray and KubeRay Pay & Benefits
At Apple, base pay is one part of our total compensation package and is determined within a range. This provides the opportunity to progress as you grow and develop within a role. The base pay range for this role is between $181,100 and $318,400, and your base pay will depend on your skills, qualifications, experience, and location. Apple employees also have the opportunity to become an Apple shareholder through participation in Apple's discretionary employee stock programs. Apple employees are eligible for discretionary restricted stock unit awards, and can purchase Apple stock at a discount if voluntarily participating in Apple's Employee Stock Purchase Plan. You'll also receive benefits including: Comprehensive medical and dental coverage, retirement benefits, a range of discounted products and free services, and for formal education related to advancing your career at Apple, reimbursement for certain educational expenses
including tuition. Additionally, this role might be eligible for discretionary bonuses or commission payments as well as relocation. Apple is an equal opportunity employer that is committed to inclusion and diversity. We seek to promote equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics.
Cupertino, California, United States The Apple Data Platform (ADP) group builds the data platform that enables the next generation of intelligent experiences on all Apple products and services. ADP empowers Apple engineers to deliver ML-driven products and innovations rapidly and at scale. We are looking for an experienced engineer who can bring their passion for machine learning, infrastructure, big data, and distributed systems to build and serve world class data+ML platforms/products at scale. You will work with many cross functional teams and lead the planning, execution and success of technical projects with the ultimate purpose of improving the ML experience for Apple customers - with a focus on designing, deploying, and optimizing model inference. Are you passionate about building scalable, reliable, maintainable infrastructure and solving data problems at scale? Come join us and be part of the Data Infrastructure journey. Responsibilities
As a member of the Apple Ray team, your responsibilities will include: Designing, implementing, and maintaining distributed systems to build world-class ML platforms/products at scale Experimenting with, deploying, and managing LLMs in a production context Benchmarking and optimizing inference deployments for different workloads, e.g. online vs. batch vs. streaming workloads Diagnosing, fixing, improving, and automating complex issues across the entire stack to ensure maximum uptime and performance Designing and extending services to improve functionality and reliability of the platform Monitoring system performance, optimizing for cost and efficiency, and resolving any issues that arise Building relationships with stakeholders across the organization to better understand internal customer needs and enhance our product better for end users Minimum Qualifications
Required: 5+ years of experience in distributed systems with deep knowledge in computer science fundamentals Experience managing deployments of LLMs at scale Experience with inference runtimes/engines, e.g. ONNXRT, TensorRT, vLLM, sglang Experience with ML Training/Inference profiling and optimization for different workloads and tasks, e.g. online inference, batch inference, streaming inference Experience with profiling ML models for different end use cases, e.g. RAG vs. code completion, etc. Experience with containerization and orchestration technologies, such as Docker and Kubernetes Experience in delivering data and machine learning infrastructure in production environments Experience configuring, deploying and troubleshooting large scale production environments Experience in designing, building, and maintaining scalable, highly available systems that prioritize ease of use Experience with alerting, monitoring and remediation automation in a large scale distributed environment Extensive programming experience in Java, Python or Go Strong collaboration and communication (verbal and written) skills B.S., M.S., or Ph.D. in Computer Science, Computer Engineering, or equivalent practical experience Preferred Qualifications
Preferred: Understanding of the ML lifecycle and state of the art ML Infrastructure technologies Familiarity with CUDA + kernel implementation Experience with inference optimization and fine-tuning techniques (e.g. pruning, distilling, quantization) Experience with deploying + optimizing ML models on heterogenous hardware, e.g. GPUs, TPUs, Inferentia, etc. Experience with GPU and other type of HPC infrastructure Experience with training framework like PyTorch, Tensorflow, JAX Deep understanding of Ray and KubeRay Pay & Benefits
At Apple, base pay is one part of our total compensation package and is determined within a range. This provides the opportunity to progress as you grow and develop within a role. The base pay range for this role is between $181,100 and $318,400, and your base pay will depend on your skills, qualifications, experience, and location. Apple employees also have the opportunity to become an Apple shareholder through participation in Apple's discretionary employee stock programs. Apple employees are eligible for discretionary restricted stock unit awards, and can purchase Apple stock at a discount if voluntarily participating in Apple's Employee Stock Purchase Plan. You'll also receive benefits including: Comprehensive medical and dental coverage, retirement benefits, a range of discounted products and free services, and for formal education related to advancing your career at Apple, reimbursement for certain educational expenses
including tuition. Additionally, this role might be eligible for discretionary bonuses or commission payments as well as relocation. Apple is an equal opportunity employer that is committed to inclusion and diversity. We seek to promote equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics.