Logo
Black Forest Labs

Member of Technical Staff - Model Serving / API Backend Engineer

Black Forest Labs, San Francisco, California, United States, 94199

Save Job

Member of Technical Staff - Model Serving / API Backend Engineer

Black Forest Labs is a cutting‑edge startup pioneering generative image and video models. Our team, which invented Stable Diffusion, Stable Video Diffusion, and FLUX.1, is developing and improving the API / model serving backend and services. Responsibilities

Develop and maintain robust APIs for serving machine learning models Transform research models into production‑ready demos and MVPs Optimize model inference for improved performance and scalability Implement and manage user preference data acquisition systems Ensure high availability and reliability of model serving infrastructure Collaborate with ML researchers to rapidly prototype and deploy new models Ideal Experience

Strong proficiency in Python and its ecosystem for machine learning, data analysis, and web development Extensive experience with RESTful API development and deployment for ML tasks Familiarity with containerization and orchestration technologies (e.g., Docker, Kubernetes) Knowledge of cloud platforms (AWS, GCP, or Azure) for deploying and scaling ML services Proven track record in rapid ML model prototyping using tools like Streamlit or Gradio Experience with distributed task queues and scalable model serving architectures Understanding of monitoring, logging, and observability best practices for ML systems Nice to Have

Experience with frontend development frameworks (e.g., Vue.js, Angular, React) Familiarity with MLOps practices and tools Knowledge of database systems and data streaming technologies Experience with A/B testing and feature flagging in production environments Understanding of security best practices for API development and ML model serving Experience with real‑time inference systems and low‑latency optimizations Knowledge of CI/CD pipelines and automated testing for ML systems Expertise in ML inference optimizations, including techniques such as reducing initialization time and memory requirements, implementing dynamic batching, utilizing reduced precision and weight quantization, applying TensorRT optimizations, performing layer fusion and model compilation, and writing custom CUDA code for performance enhancements Seniority Level

Mid‑Senior level Employment Type

Full‑time Job Function

Engineering and Information Technology

#J-18808-Ljbffr