Member of Technical Staff - Machine Learning Engineer, Inference ...
Liquid AI, Inc. - Oklahoma City, Oklahoma, United States
Work at Liquid AI, Inc.
Overview
- View job
Overview
ML Engineer (Inference)
to build and optimize the end-to-end serving stack for Liquid AI’s foundation models. You will develop the pipeline between a trained model checkpoint and a production-grade, low-latency API. This is a highly technical role operating on the frontier of AI inference research and production Desired Experience
PyTorch Python Model-serving frameworks (e.g. TensorRT, vLLM, SGLang) You're A Great Fit If
You have experience building large-scale production stacks for model serving. You have a solid understanding of ragged batching, dynamic load balancing, KV-cache management, and other multi-tenant serving techniques. You have experience with applying quantization strategies (e.g., FP8, INT4) while safeguarding model accuracy. You have deployed models in both single-GPU and multi-GPU environments and can diagnose performance issues across the stack. What You'll Actually Do
Optimize and productionize the end-to-end pipeline for GPU model inference around Liquid Foundation Models (LFMs). Facilitate the development of next-generation Liquid Foundation Models from the lens of GPU inference. Profile and robustify the stack for different batching and serving requirements. Build and scale pipelines for test-time compute. What You'll Gain
Hands-on experience with state-of-the-art technology at a leading AI company. Deeper expertise in machine learning systems and efficient large model inference. Opportunity to scale pipelines that directly influence user latency and experience with Liquid's models. A collaborative, fast-paced environment where your work directly shapes our products and the next generation of LFMs.
#J-18808-Ljbffr