Black Forest Labs
Member of Technical Staff - Large scale data infrastructure
Black Forest Labs, San Francisco, California, United States, 94199
Member of Technical Staff - Large scale data infrastructure
What if the ability to continually train improved models is just the capability to retrieve and process all our data?
Our founding team pioneered Latent Diffusion and Stable Diffusion - breakthroughs that made generative AI accessible to millions. Today, our FLUX models power creative tools, design workflows, and products across industries worldwide.
Our FLUX models are best-in-class not only for their capability, but for ease of use in developing production applications. We top public benchmarks and compete at the frontier - and in most instances we're winning.
If you're relentlessly curious and driven by high agency, we want to talk.
With a team of ~50, we move fast and punch above our weight. From our labs in Freiburg - a university town in the Black Forest - and San Francisco, we're building what comes next.
What You'll Pioneer
Develops and maintains scalable infrastructure to store and retrieve massive-scale image and video datasets—the kind where "large" means billions of assets, not millions
Optimizes data retrieval so that every training run can fully utilize all GPUs
Builds tooling to efficiently manage datasets
Manages and coordinates data transfers from licensing partners
Makes sure we are using our object storage as efficiently as possible
Questions We're Wrestling With
What formats will give the best dataloading speed while maintaining the needed flexibility to keep building on top of the data?
What are the actual bottlenecks and failure cases when retrieving data at scale?
How can we identify, prevent and route around data retrieval failures in individual processes?
Who Thrives Here You've managed large-scale object storage with high retrieval rates in the past. You know the difference between infrastructure that works in theory and infrastructure that works when researchers depend on it.
Strong proficiency in Python and experience with various file systems for data-intensive manipulation and analysis
Experience building reliable and scalable data loaders for machine learning applications
Deep knowledge about cloud object storage and the challenges that go hand in hand with it.
Hands‑on familiarity with cloud object storage such as S3 and Azure Blob Storage, cloud platforms (AWS, GCP, or Azure) and Slurm/HPC environments for distributed data processing
Have created and managed storage infrastructure in the PB-scale before
Have worked with large scale image and video data before.
What We're Building Toward We're not just maintaining infrastructure—we're building the computational foundation that determines what research is possible. We are designing systems that will power all future training and data processing. If that sounds more compelling than keeping existing systems running, we should talk.
#J-18808-Ljbffr
Our founding team pioneered Latent Diffusion and Stable Diffusion - breakthroughs that made generative AI accessible to millions. Today, our FLUX models power creative tools, design workflows, and products across industries worldwide.
Our FLUX models are best-in-class not only for their capability, but for ease of use in developing production applications. We top public benchmarks and compete at the frontier - and in most instances we're winning.
If you're relentlessly curious and driven by high agency, we want to talk.
With a team of ~50, we move fast and punch above our weight. From our labs in Freiburg - a university town in the Black Forest - and San Francisco, we're building what comes next.
What You'll Pioneer
Develops and maintains scalable infrastructure to store and retrieve massive-scale image and video datasets—the kind where "large" means billions of assets, not millions
Optimizes data retrieval so that every training run can fully utilize all GPUs
Builds tooling to efficiently manage datasets
Manages and coordinates data transfers from licensing partners
Makes sure we are using our object storage as efficiently as possible
Questions We're Wrestling With
What formats will give the best dataloading speed while maintaining the needed flexibility to keep building on top of the data?
What are the actual bottlenecks and failure cases when retrieving data at scale?
How can we identify, prevent and route around data retrieval failures in individual processes?
Who Thrives Here You've managed large-scale object storage with high retrieval rates in the past. You know the difference between infrastructure that works in theory and infrastructure that works when researchers depend on it.
Strong proficiency in Python and experience with various file systems for data-intensive manipulation and analysis
Experience building reliable and scalable data loaders for machine learning applications
Deep knowledge about cloud object storage and the challenges that go hand in hand with it.
Hands‑on familiarity with cloud object storage such as S3 and Azure Blob Storage, cloud platforms (AWS, GCP, or Azure) and Slurm/HPC environments for distributed data processing
Have created and managed storage infrastructure in the PB-scale before
Have worked with large scale image and video data before.
What We're Building Toward We're not just maintaining infrastructure—we're building the computational foundation that determines what research is possible. We are designing systems that will power all future training and data processing. If that sounds more compelling than keeping existing systems running, we should talk.
#J-18808-Ljbffr