Modal
The Role
We are looking for strong engineers with experience in making ML systems performant at scale. If you are interested in contributing to open-source projects and Modal's container runtime to push language and diffusion models towards higher throughput and lower latency, we'd love to hear from you! Requirements
5+ years of experience writing high-quality, high-performance code. Experience working with torch, high-level ML frameworks, and inference engines (vLLM or TensorRT). Familiarity with Nvidia GPU architecture and CUDA. Experience with ML performance engineering (tell us a story about boosting GPU performance
debugging SM occupancy issues, rewriting an algorithm to be compute-bound, eliminating host overhead, etc). Nice-to-have: familiarity with low-level operating system foundations (Linux kernel, file systems, containers, etc). Ability to work in-person, in our NYC, San Francisco or Stockholm office.
We are looking for strong engineers with experience in making ML systems performant at scale. If you are interested in contributing to open-source projects and Modal's container runtime to push language and diffusion models towards higher throughput and lower latency, we'd love to hear from you! Requirements
5+ years of experience writing high-quality, high-performance code. Experience working with torch, high-level ML frameworks, and inference engines (vLLM or TensorRT). Familiarity with Nvidia GPU architecture and CUDA. Experience with ML performance engineering (tell us a story about boosting GPU performance
debugging SM occupancy issues, rewriting an algorithm to be compute-bound, eliminating host overhead, etc). Nice-to-have: familiarity with low-level operating system foundations (Linux kernel, file systems, containers, etc). Ability to work in-person, in our NYC, San Francisco or Stockholm office.