Black Forest Labs

Member of Technical Staff - Pretraining / Inference Optimization

Black Forest Labs, San Francisco, California, United States, 94199

Black Forest Labs is a cutting-edge startup pioneering generative image and video models. Our team, which invented Stable Diffusion, Stable Video Diffusion, and FLUX.1, is currently seeking a strong researcher / engineer to work closely with our research team on pretraining and inference optimization.

Role:

Finding ideal training strategies (parallelism, precision trade-offs) for a variety of model sizes and compute loads

Profiling, debugging, and optimizing single and multi-GPU operations using tools such as Nsight or stack trace viewers

Reasoning about the speed and quality trade-offs of quantization for model inference

Developing and improving low-level kernel optimizations for state-of-the-art inference and training

Innovating new ideas that bring us closer to the limits of a GPU

Ideal Experiences:

Being familiar with the latest and the most effective techniques in optimizing inference and training workloads

Optimizing for both memory-bound and compute-bound operations

Understanding GPU memory hierarchy and computation capabilities

Deep understanding of efficient attention algorithms

Implementing both forward and backward Triton kernels and ensuring their correctness while considering floating point errors

Using, for example, pybind to integrate custom-written kernels into a PyTorch framework

Nice to have:

Experience with Diffusion and Autoregressive models

Experience in low-level CUDA kernel optimizations

#J-18808-Ljbffr