OpenAI

Software Engineer, Inference - Multi Modal

OpenAI, San Francisco, California, United States, 94199

Software Engineer, Inference - Multi Modal

Join to apply for the

Software Engineer, Inference - Multi Modal

role at

OpenAI . About The Team

OpenAI’s Inference team powers the deployment of our most advanced models—including GPT, Image Generation, and Whisper—across various platforms. We focus on ensuring these models are available, performant, and scalable in production, collaborating closely with Research to bring new models into the world. Our team is small, fast-moving, and dedicated to delivering a world-class developer experience while pushing AI boundaries. We are expanding into multimodal inference, building infrastructure for models that handle image, audio, and other non-text modalities. These workloads are more heterogeneous and experimental, involving diverse model sizes, complex input/output formats, and tighter coordination with product and research teams. About The Role

We’re seeking a software engineer to help serve OpenAI’s multimodal models at scale. You will be part of a small team responsible for creating reliable, high-performance infrastructure for real-time audio, image, and other multimodal workloads in production. This role involves cross-functional collaboration with researchers and product teams to deploy advanced capabilities that enable new modes of interaction with models. In This Role, You Will

Design and implement inference infrastructure for large-scale multimodal models. Optimize systems for high-throughput, low-latency delivery of image and audio inputs and outputs. Support experimental research workflows transitioning into production services. Collaborate with researchers, infrastructure teams, and product engineers to deploy state-of-the-art capabilities. Improve system-level performance, including GPU utilization, tensor parallelism, and hardware abstraction layers. You Might Thrive In This Role If You

Have experience building and scaling inference systems for LLMs or multimodal models. Have worked with GPU-based ML workloads and understand performance dynamics of large models involving complex data like images or audio. Enjoy experimental, fast-evolving work and close collaboration with research teams. Are comfortable with systems spanning networking, distributed compute, and high-throughput data handling. Have familiarity with inference tooling like vLLM, TensorRT-LLM, or custom model parallel systems. Own problems end-to-end and thrive in ambiguous, fast-paced environments. Nice To Have

Experience working with image generation or audio synthesis models in production. Exposure to distributed ML training or system-efficient model design. About OpenAI

OpenAI is dedicated to ensuring that general-purpose AI benefits all humanity. We push AI capabilities forward and seek to deploy them safely and ethically. We value diversity and are an equal opportunity employer. We consider qualified applicants with arrest or conviction records in line with applicable laws. We provide reasonable accommodations for applicants with disabilities. Join us in shaping the future of AI technology.

#J-18808-Ljbffr