Apple Inc.
AIML - Machine Learning Engineer, Foundation Model Services
Apple Inc., Seattle, Washington, us, 98127
Seattle, Washington, United States
Description
Work alongside the Foundation Model Research team to optimize inference for cutting-edge model architectures. Collaborate closely with product teams to develop production-grade solutions for launching models that serve millions of customers in real time. Build tools to identify bottlenecks in inference across different hardware and use cases. Mentor and guide engineers within the organization. Minimum Qualifications
Demonstrated experience in leading and managing complex, ambiguous projects. Experience with high-throughput services at supercomputing scale. Proficiency in deploying applications on Cloud platforms (AWS, Azure, or equivalent) using Kubernetes and Docker. Knowledge of GPU programming with CUDA and familiarity with machine learning frameworks like PyTorch or TensorFlow. Preferred Qualifications
Experience in building and maintaining systems in modern languages (e.g., Go, Python). Understanding of deep learning architectures such as Transformer models and encoder/decoder models. Familiarity with NVIDIA TensorRT-LLM, vLLM, DeepSpeed, NVIDIA Triton Inference Server. Experience writing custom CUDA kernels using CUDA or OpenAI Triton.
#J-18808-Ljbffr
Work alongside the Foundation Model Research team to optimize inference for cutting-edge model architectures. Collaborate closely with product teams to develop production-grade solutions for launching models that serve millions of customers in real time. Build tools to identify bottlenecks in inference across different hardware and use cases. Mentor and guide engineers within the organization. Minimum Qualifications
Demonstrated experience in leading and managing complex, ambiguous projects. Experience with high-throughput services at supercomputing scale. Proficiency in deploying applications on Cloud platforms (AWS, Azure, or equivalent) using Kubernetes and Docker. Knowledge of GPU programming with CUDA and familiarity with machine learning frameworks like PyTorch or TensorFlow. Preferred Qualifications
Experience in building and maintaining systems in modern languages (e.g., Go, Python). Understanding of deep learning architectures such as Transformer models and encoder/decoder models. Familiarity with NVIDIA TensorRT-LLM, vLLM, DeepSpeed, NVIDIA Triton Inference Server. Experience writing custom CUDA kernels using CUDA or OpenAI Triton.
#J-18808-Ljbffr