Fireworks AI
Member of Technical Staff, Multimedia
Fireworks AI, San Mateo, California, United States, 94409
About Us:
At Fireworks, we're building the future of generative AI infrastructure. Our platform delivers the highest-quality models with the fastest and most scalable inference in the industry. We've been independently benchmarked as the leader in LLM inference speed and are driving cutting-edge innovation through projects like our own function calling and multimodal models. Fireworks is a Series C company valued at $4 billion and backed by top investors including Benchmark, Sequoia, Lightspeed, Index, and Evantic. We're an ambitious, collaborative team of builders, founded by veterans of Meta PyTorch and Google Vertex AI.
The Role:
We are looking for highly motivated engineers and researchers to join our Multimedia team as Members of Technical Staff. In this role, you will help advance Fireworks AI's capabilities across speech, vision, and multimodal systems, from deploying and training models to building the infrastructure that powers real-time, production-ready AI experiences.
We welcome both generalists with broad multimedia expertise and specialists with deeper focus in speech/audio or vision-language modeling. You will work at the intersection of research and engineering, turning cutting-edge model innovation into high-performance, scalable systems that power Fireworks AI's next-generation products. Key Responsibilities: Design, train, and implement machine learning models for speech, vision, or multimodal applications, including ASR, TTS, image understanding, captioning, retrieval, and speech-to-speech systems Bring new model capabilities from research to production, ensuring high performance and reliability Build and optimize the infrastructure that supports distributed training, fine-tuning, and real-time inference across multimedia workloads Profile and address performance bottlenecks across the stack, from preprocessing and model training to deployment and serving Write high-quality, maintainable code for both experimentation and production systems Evaluate and integrate the latest research to improve model performance, scalability, and efficiency Work directly with internal and external stakeholders to define use cases, deliver custom optimizations, and inform the multimedia roadmap Contribute to open-source efforts and help shape the future of multimodal AI at Fireworks Minimum Qualifications: Bachelor's degree in Computer Science, Electrical Engineering, or a related technical field 3-5+ years of experience in machine learning, backend infrastructure, or data-intensive systems Strong proficiency in Python and familiarity with deep learning frameworks such as PyTorch or TensorFlow Demonstrated experience in one or more of the following areas: Speech or audio modeling (ASR, TTS, speech-to-speech) Vision or vision-language modeling (captioning, VQA, retrieval, multimodal reasoning) Backend or infrastructure engineering for ML workloads (training, inference, optimization, APIs) Experience building production-quality systems and collaborating across research and engineering teams Familiarity with cloud platforms (AWS, GCP, or Azure) and containerization/orchestration tools (Docker, Kubernetes) Preferred Qualifications Master's or PhD in a relevant technical field with research experience in speech, vision, or multimodal modeling Experience deploying and optimizing ML models in production, including distributed training, fine-tuning, or inference optimization Familiarity with model optimization techniques such as quantization, speculative decoding, or parameter-efficient fine-tuning (LoRA or QLoRA) Background in multimodal AI infrastructure or experience at a hyperscaler, AI infrastructure startup, or LLM platform Strong understanding of GPU performance, networking, and scaling for multimedia workloads Contributions to open-source projects or published first-author research papers in top-tier conferences such as NeurIPS, ICML, CVPR, ACL, or Interspeech Ability to thrive in a fast-paced, low-process environment and drive high-impact, company-defining work Why Fireworks AI?
Solve Hard Problems: Tackle challenges at the forefront of AI infrastructure, from low-latency inference to scalable model serving. Build What's Next: Work with bleeding-edge technology that impacts how businesses and developers harness AI globally. Ownership & Impact: Join a fast-growing, passionate team where your work directly shapes the future of AI-no bureaucracy, just results. Learn from the Best: Collaborate with world-class engineers and AI researchers who thrive on curiosity and innovation.
Fireworks AI is an equal-opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all innovators.
At Fireworks, we're building the future of generative AI infrastructure. Our platform delivers the highest-quality models with the fastest and most scalable inference in the industry. We've been independently benchmarked as the leader in LLM inference speed and are driving cutting-edge innovation through projects like our own function calling and multimodal models. Fireworks is a Series C company valued at $4 billion and backed by top investors including Benchmark, Sequoia, Lightspeed, Index, and Evantic. We're an ambitious, collaborative team of builders, founded by veterans of Meta PyTorch and Google Vertex AI.
The Role:
We are looking for highly motivated engineers and researchers to join our Multimedia team as Members of Technical Staff. In this role, you will help advance Fireworks AI's capabilities across speech, vision, and multimodal systems, from deploying and training models to building the infrastructure that powers real-time, production-ready AI experiences.
We welcome both generalists with broad multimedia expertise and specialists with deeper focus in speech/audio or vision-language modeling. You will work at the intersection of research and engineering, turning cutting-edge model innovation into high-performance, scalable systems that power Fireworks AI's next-generation products. Key Responsibilities: Design, train, and implement machine learning models for speech, vision, or multimodal applications, including ASR, TTS, image understanding, captioning, retrieval, and speech-to-speech systems Bring new model capabilities from research to production, ensuring high performance and reliability Build and optimize the infrastructure that supports distributed training, fine-tuning, and real-time inference across multimedia workloads Profile and address performance bottlenecks across the stack, from preprocessing and model training to deployment and serving Write high-quality, maintainable code for both experimentation and production systems Evaluate and integrate the latest research to improve model performance, scalability, and efficiency Work directly with internal and external stakeholders to define use cases, deliver custom optimizations, and inform the multimedia roadmap Contribute to open-source efforts and help shape the future of multimodal AI at Fireworks Minimum Qualifications: Bachelor's degree in Computer Science, Electrical Engineering, or a related technical field 3-5+ years of experience in machine learning, backend infrastructure, or data-intensive systems Strong proficiency in Python and familiarity with deep learning frameworks such as PyTorch or TensorFlow Demonstrated experience in one or more of the following areas: Speech or audio modeling (ASR, TTS, speech-to-speech) Vision or vision-language modeling (captioning, VQA, retrieval, multimodal reasoning) Backend or infrastructure engineering for ML workloads (training, inference, optimization, APIs) Experience building production-quality systems and collaborating across research and engineering teams Familiarity with cloud platforms (AWS, GCP, or Azure) and containerization/orchestration tools (Docker, Kubernetes) Preferred Qualifications Master's or PhD in a relevant technical field with research experience in speech, vision, or multimodal modeling Experience deploying and optimizing ML models in production, including distributed training, fine-tuning, or inference optimization Familiarity with model optimization techniques such as quantization, speculative decoding, or parameter-efficient fine-tuning (LoRA or QLoRA) Background in multimodal AI infrastructure or experience at a hyperscaler, AI infrastructure startup, or LLM platform Strong understanding of GPU performance, networking, and scaling for multimedia workloads Contributions to open-source projects or published first-author research papers in top-tier conferences such as NeurIPS, ICML, CVPR, ACL, or Interspeech Ability to thrive in a fast-paced, low-process environment and drive high-impact, company-defining work Why Fireworks AI?
Solve Hard Problems: Tackle challenges at the forefront of AI infrastructure, from low-latency inference to scalable model serving. Build What's Next: Work with bleeding-edge technology that impacts how businesses and developers harness AI globally. Ownership & Impact: Join a fast-growing, passionate team where your work directly shapes the future of AI-no bureaucracy, just results. Learn from the Best: Collaborate with world-class engineers and AI researchers who thrive on curiosity and innovation.
Fireworks AI is an equal-opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all innovators.