Apple Inc.
Sr. Machine Learning Engineer, Apple Services Engineering
Apple Inc., Seattle, Washington, us, 98127
Seattle, Washington, United States Software and Services
The Applied Machine Learning team builds production multimodal systems that understand and transform large-scale image, audio, and video content. Our work spans diffusion-based image generation, transcription and diarization, face and object detection, OCR and image description for search, and automated quality control of media pipelines. We are looking for a Staff Machine Learning Engineer to strengthen our diffusion and video models, adapt small and mid-sized LLMs, and turn our uniquely large corpus of weakly labeled media into a durable product advantage. This is an opportunity to shape the next generation of multimodal experiences end-to-end, from data and models to evaluation and user impact.
Description The Staff Machine Learning Engineer — Multimodal Generation & Post-Training will be a senior individual contributor on a small, applied ML team focused on production multimodal systems. The role will lead fine‑tuning and adaptation of diffusion and emerging video models, as well as post‑training of small and medium LLMs for captioning, moderation, and retrieval‑friendly descriptions. The engineer will design data and evaluation workflows that use our large archive of weakly labeled music, podcast, film, TV, and short‑form content to drive measurable quality and efficiency improvements. The role includes close collaboration with partner infra teams for model serving and with adjacent product and research groups to bring new capabilities into production.
Minimum Qualifications
Master’s degree in Computer Science, Electrical Engineering, or a related technical field, or equivalent practical experience.
5+ years of hands‑on industry experience building and shipping machine learning systems to production.
Proven experience training and fine‑tuning diffusion or other image/video generative models, including adapter‑based methods such as LoRA.
Proficiency in Python and at least one major deep learning framework such as PyTorch.
Experience designing and operating ML pipelines for noisy or weakly labeled data, including offline evaluation and monitoring in production.
Strong software engineering skills, including code quality, experimentation discipline, and debugging/profiling of model performance.
Preferred Qualifications
PhD in Computer Science, Machine Learning, or a related technical field.
8+ years of industry experience with production multimodal systems spanning image, audio, and/or video.
Deep expertise with diffusion and video generation techniques (e.g., ControlNet/IP-Adapter, temporal consistency methods, sampling and latency optimization).
Experience with PEFT/QLoRA and post‑training approaches such as DPO or related preference‑based methods for small and mid‑sized LLMs.
Background in ASR/VAD/diarization, OCR, multimodal retrieval, or face recognition with fine‑grained temporal alignment.
Familiarity collaborating with infra/platform teams on model serving (e.g., batching strategies, quantization, observability) and translating requirements into reliable production deployments.
Demonstrated ability to define metrics, build evaluation harnesses, and communicate results clearly to cross‑functional partners.
Track record of publications, patents, or open‑source contributions in relevant areas of machine learning or multimodal modeling.
At Apple, base pay is one part of our total compensation package and is determined within a range. This provides the opportunity to progress as you grow and develop within a role. The base pay range for this role is between $201,300 and $302,200, and your base pay will depend on your skills, qualifications, experience, and location.
Apple employees also have the opportunity to become an Apple shareholder through participation in Apple’s discretionary employee stock programs. Apple employees are eligible for discretionary restricted stock unit awards, and can purchase Apple stock at a discount if voluntarily participating in Apple’s Employee Stock Purchase Plan. You’ll also receive benefits including: Comprehensive medical and dental coverage, retirement benefits, a range of discounted products and free services, and for formal education related to advancing your career at Apple, reimbursement for certain educational expenses — including tuition. Additionally, this role might be eligible for discretionary bonuses or commission payments as well as relocation. Learn more about Apple Benefits.
Note: Apple benefit, compensation and employee stock programs are subject to eligibility requirements and other terms of the applicable plan or program.
Apple is an equal opportunity employer that is committed to inclusion and diversity. We seek to promote equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics. Learn more about your EEO rights as an applicant.
Apple accepts applications to this posting on an ongoing basis.
#J-18808-Ljbffr
The Applied Machine Learning team builds production multimodal systems that understand and transform large-scale image, audio, and video content. Our work spans diffusion-based image generation, transcription and diarization, face and object detection, OCR and image description for search, and automated quality control of media pipelines. We are looking for a Staff Machine Learning Engineer to strengthen our diffusion and video models, adapt small and mid-sized LLMs, and turn our uniquely large corpus of weakly labeled media into a durable product advantage. This is an opportunity to shape the next generation of multimodal experiences end-to-end, from data and models to evaluation and user impact.
Description The Staff Machine Learning Engineer — Multimodal Generation & Post-Training will be a senior individual contributor on a small, applied ML team focused on production multimodal systems. The role will lead fine‑tuning and adaptation of diffusion and emerging video models, as well as post‑training of small and medium LLMs for captioning, moderation, and retrieval‑friendly descriptions. The engineer will design data and evaluation workflows that use our large archive of weakly labeled music, podcast, film, TV, and short‑form content to drive measurable quality and efficiency improvements. The role includes close collaboration with partner infra teams for model serving and with adjacent product and research groups to bring new capabilities into production.
Minimum Qualifications
Master’s degree in Computer Science, Electrical Engineering, or a related technical field, or equivalent practical experience.
5+ years of hands‑on industry experience building and shipping machine learning systems to production.
Proven experience training and fine‑tuning diffusion or other image/video generative models, including adapter‑based methods such as LoRA.
Proficiency in Python and at least one major deep learning framework such as PyTorch.
Experience designing and operating ML pipelines for noisy or weakly labeled data, including offline evaluation and monitoring in production.
Strong software engineering skills, including code quality, experimentation discipline, and debugging/profiling of model performance.
Preferred Qualifications
PhD in Computer Science, Machine Learning, or a related technical field.
8+ years of industry experience with production multimodal systems spanning image, audio, and/or video.
Deep expertise with diffusion and video generation techniques (e.g., ControlNet/IP-Adapter, temporal consistency methods, sampling and latency optimization).
Experience with PEFT/QLoRA and post‑training approaches such as DPO or related preference‑based methods for small and mid‑sized LLMs.
Background in ASR/VAD/diarization, OCR, multimodal retrieval, or face recognition with fine‑grained temporal alignment.
Familiarity collaborating with infra/platform teams on model serving (e.g., batching strategies, quantization, observability) and translating requirements into reliable production deployments.
Demonstrated ability to define metrics, build evaluation harnesses, and communicate results clearly to cross‑functional partners.
Track record of publications, patents, or open‑source contributions in relevant areas of machine learning or multimodal modeling.
At Apple, base pay is one part of our total compensation package and is determined within a range. This provides the opportunity to progress as you grow and develop within a role. The base pay range for this role is between $201,300 and $302,200, and your base pay will depend on your skills, qualifications, experience, and location.
Apple employees also have the opportunity to become an Apple shareholder through participation in Apple’s discretionary employee stock programs. Apple employees are eligible for discretionary restricted stock unit awards, and can purchase Apple stock at a discount if voluntarily participating in Apple’s Employee Stock Purchase Plan. You’ll also receive benefits including: Comprehensive medical and dental coverage, retirement benefits, a range of discounted products and free services, and for formal education related to advancing your career at Apple, reimbursement for certain educational expenses — including tuition. Additionally, this role might be eligible for discretionary bonuses or commission payments as well as relocation. Learn more about Apple Benefits.
Note: Apple benefit, compensation and employee stock programs are subject to eligibility requirements and other terms of the applicable plan or program.
Apple is an equal opportunity employer that is committed to inclusion and diversity. We seek to promote equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics. Learn more about your EEO rights as an applicant.
Apple accepts applications to this posting on an ongoing basis.
#J-18808-Ljbffr