Stealth Company
Get AI-powered advice on this job and more exclusive features.
We're a well-funded stealth startup backed by proven unicorn founders, building the next generation of AI-powered consumer hardware. We're assembling a small, elite team to create revolutionary products that integrate cutting-edge voice, vision, and AI technologies. If you're excited about optimizing AI models for real-world deployment and shipping world-changing technology—we'd love to talk.
The Role Join as our AI inference specialist to optimize and deploy models that power our device. You'll work directly with founders who've built unicorn companies and know how to ship fast. This is ML optimization at its finest—converting, optimizing, and serving models for production at scale with low latency.
What You’ll Do
Optimize and convert AI models for production inference engines
Work with frameworks like VLLM, SGLang, and similar serving systems
Optimize TTS, STT, vision, and multimodal models for deployment
Convert models to efficient formats (ONNX, TensorRT, etc.)
Build and tune inference pipelines for low-latency requirements
Quantize and compress models while maintaining quality
Benchmark and profile model performance
Integrate optimized models with our device infrastructure
Requirements
4+ years experience with ML model optimization and deployment
Strong experience with inference engines (VLLM, SGLang, TensorRT, etc.)
Deep knowledge of model conversion and optimization (ONNX, quantization, pruning)
Experience optimizing TTS, STT, vision, or multimodal models
Strong Python and C++ programming skills
Understanding of GPU optimization and CUDA
Track record of deploying AI models in production
PyTorch, TensorFlow, or similar ML frameworks
Why Join
Build an ambitious product with real-world impact from prototypes to mass production
Work with founders who've built unicorn companies and know how to ship fast
Competitive compensation, equity, and the chance to shape the product—and the company
Small empowered team. No bureaucracy. Big upside.
#J-18808-Ljbffr
We're a well-funded stealth startup backed by proven unicorn founders, building the next generation of AI-powered consumer hardware. We're assembling a small, elite team to create revolutionary products that integrate cutting-edge voice, vision, and AI technologies. If you're excited about optimizing AI models for real-world deployment and shipping world-changing technology—we'd love to talk.
The Role Join as our AI inference specialist to optimize and deploy models that power our device. You'll work directly with founders who've built unicorn companies and know how to ship fast. This is ML optimization at its finest—converting, optimizing, and serving models for production at scale with low latency.
What You’ll Do
Optimize and convert AI models for production inference engines
Work with frameworks like VLLM, SGLang, and similar serving systems
Optimize TTS, STT, vision, and multimodal models for deployment
Convert models to efficient formats (ONNX, TensorRT, etc.)
Build and tune inference pipelines for low-latency requirements
Quantize and compress models while maintaining quality
Benchmark and profile model performance
Integrate optimized models with our device infrastructure
Requirements
4+ years experience with ML model optimization and deployment
Strong experience with inference engines (VLLM, SGLang, TensorRT, etc.)
Deep knowledge of model conversion and optimization (ONNX, quantization, pruning)
Experience optimizing TTS, STT, vision, or multimodal models
Strong Python and C++ programming skills
Understanding of GPU optimization and CUDA
Track record of deploying AI models in production
PyTorch, TensorFlow, or similar ML frameworks
Why Join
Build an ambitious product with real-world impact from prototypes to mass production
Work with founders who've built unicorn companies and know how to ship fast
Competitive compensation, equity, and the chance to shape the product—and the company
Small empowered team. No bureaucracy. Big upside.
#J-18808-Ljbffr