Logo
Black Forest Labs

Member of Technical Staff - Large scale data infrastructure

Black Forest Labs, San Francisco, California, United States, 94199

Save Job

Member of Technical Staff - Large scale data infrastructure What if the ability to continually train improved models is just the capability to retrieve and process all our data?

Our founding team pioneered Latent Diffusion and Stable Diffusion - breakthroughs that made generative AI accessible to millions. Today, our FLUX models power creative tools, design workflows, and products across industries worldwide.

Our FLUX models are best-in-class not only for their capability, but for ease of use in developing production applications. We top public benchmarks and compete at the frontier - and in most instances we're winning.

If you're relentlessly curious and driven by high agency, we want to talk.

With a team of ~50, we move fast and punch above our weight. From our labs in Freiburg - a university town in the Black Forest - and San Francisco, we're building what comes next.

What You'll Pioneer

Develops and maintains scalable infrastructure to store and retrieve massive-scale image and video datasets—the kind where "large" means billions of assets, not millions

Optimizes data retrieval so that every training run can fully utilize all GPUs

Builds tooling to efficiently manage datasets

Manages and coordinates data transfers from licensing partners

Makes sure we are using our object storage as efficiently as possible

Questions We're Wrestling With

What formats will give the best dataloading speed while maintaining the needed flexibility to keep building on top of the data?

What are the actual bottlenecks and failure cases when retrieving data at scale?

How can we identify, prevent and route around data retrieval failures in individual processes?

Who Thrives Here You've managed large-scale object storage with high retrieval rates in the past. You know the difference between infrastructure that works in theory and infrastructure that works when researchers depend on it.

Strong proficiency in Python and experience with various file systems for data-intensive manipulation and analysis

Experience building reliable and scalable data loaders for machine learning applications

Deep knowledge about cloud object storage and the challenges that go hand in hand with it.

Hands‑on familiarity with cloud object storage such as S3 and Azure Blob Storage, cloud platforms (AWS, GCP, or Azure) and Slurm/HPC environments for distributed data processing

Have created and managed storage infrastructure in the PB-scale before

Have worked with large scale image and video data before.

What We're Building Toward We're not just maintaining infrastructure—we're building the computational foundation that determines what research is possible. We are designing systems that will power all future training and data processing. If that sounds more compelling than keeping existing systems running, we should talk.

#J-18808-Ljbffr