Black Forest Labs

Member of Technical Staff - Pretraining / Inference Optimization

Black Forest Labs, San Francisco, California, United States, 94199

At Black Forest Labs, we're on a mission to advance the state of the art in generative deep learning for media, building powerful, creative, and open models that push what's possible.

Born from foundational research, we continuously create advanced infrastructure to transform ideas into images and videos.

Our team pioneered Latent Diffusion, Stable Diffusion, and FLUX.1 - milestones in the evolution of generative AI. Today, these foundations power millions of creations worldwide, from individual artists to enterprise applications.

What you'll be doing:

Finding ideal training strategies (parallelism, precision trade-offs) for a variety of model sizes and compute loads Profiling, debugging, and optimizing single and multi-GPU operations using tools such as Nsight or stack trace viewers Reasoning about the speed and quality trade-offs of quantization for model inference Developing and improving low-level kernel optimizations for state-of-the-art inference and training Innovating new ideas that bring us closer to the limits of a GPU Ideal Experience:

Being familiar with the latest and the most effective techniques in optimizing inference and training workloads Optimizing for both memory-bound and compute-bound operations Understanding GPU memory hierarchy and computation capabilities Deep understanding of efficient attention algorithms Implementing both forward and backward Triton kernels and ensuring their correctness while considering floating point errors Using, for example, pybind to integrate custom-written kernels into a PyTorch framework Nice to have:

Experience with Diffusion and Autoregressive models Experience in low-level CUDA kernel optimizations

Base Annual Salary: $180,000 - $300,000 USD