Promote Project
Forward Deployed Engineer AI Inference
Promote Project, Oklahoma City, Oklahoma, United States
Overview
The vLLM and LLM‑D Engineering team at Red Hat is looking for a customer‑obsessed developer to join as a Forward Deployed Engineer. You will bridge the cutting‑edge inference platform (LLM‑D, and vLLM) with our customers’ most critical production environments, deploying, optimizing, and scaling distributed LLM inference systems on complex Kubernetes clusters.
Responsibilities
Orchestrate Distributed Inference: deploy and configure LLM‑D and vLLM on Kubernetes clusters, including advanced deployment strategies such as disaggregated serving, KV‑cache aware routing, and KV Cache offloading.
Optimize for Production: run performance benchmarks, tune vLLM parameters, and configure intelligent inference routing policies to meet SLOs for latency and throughput, focusing on Time Per Output Token (TPOT), GPU utilization, networking, and scheduler efficiency.
Code Side‑by‑Side: write production‑quality code (Python/Go/YAML) that integrates the inference engine into customers’ Kubernetes ecosystems.
Solve the “Unsolvable”: debug complex interactions between model architectures (e.g., MoE, large context windows), hardware accelerators (NVIDIA GPUs, AMD GPUs, TPUs), and Kubernetes networking (Envoy/ISTIO).
Feedback Loop: serve as “Customer Zero,” channeling field learnings back to product development and influencing the roadmap for LLM‑D and vLLM features.
Travel as needed to customers to present, demo, or help execute proof‑of‑concepts.
Qualifications
8+ years of engineering experience in Backend Systems, SRE, or Infrastructure Engineering.
Customer fluency: speak both “Systems Engineering” and “Business Value.”
Bias for action with rapid prototyping and iteration over theoretical perfection.
Deep Kubernetes expertise: fluent in K8s primitives, CRDs, operators, controllers, Gateway API ingress, stateful workloads, high‑performance networking, scheduler tuning for GPU workloads, and troubleshooting complex CNI failures.
AI inference proficiency: understand LLM forward pass, KV caching, prefill/decode disaggregation, context length impact, continuous batching in vLLM.
Systems programming: proficiency in Python (model interfaces) and Go (Kubernetes controllers/scheduler logic).
Infrastructure‑as‑code experience with Helm, Terraform, or similar tools.
Cloud & GPU hardware fluency: spin up clusters and deploy LLMs on bare‑metal and hyperscaler Kubernetes clusters.
Preferred: contributions to open‑source AI infrastructure projects (e.g., KServe, vLLM, Kubernetes); knowledge of Envoy Proxy or Inference Gateway (IGW); familiarity with model optimization techniques like Quantization (AWQ, GPTQ) and Speculative Decoding.
Salary Range $189,600.00 – $312,730.00. Actual offer will be based on qualifications. Pay transparency applies; for Remote‑US locations the range may differ but remains commensurate with duties and experience.
Benefits
Comprehensive medical, dental, and vision coverage
Flexible Spending Account—healthcare and dependent care
Health Savings Account—high deductible medical plan
Retirement 401(k) with employer match
Paid time off and holidays
Paid parental leave plans for all new parents
Leave benefits including disability, paid family medical leave, and paid military leave
Additional benefits: employee stock purchase plan, family planning reimbursement, tuition reimbursement, transportation expense account, employee assistance program, and more
About Red Hat Red Hat is the world’s leading provider of enterprise open‑source software solutions, delivering high‑performing Linux, cloud, container, and Kubernetes technologies. Our associates work flexibly across in‑office, office‑flex, and fully remote environments worldwide. We harness an open and inclusive culture to empower ideas from all backgrounds, drive innovation, and build collaborative solutions.
Equal Opportunity Policy (EEO) Red Hat is proud to be an equal‑opportunity workplace and an affirmative action employer. We review applications without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, ancestry, citizenship, age, veteran status, genetic information, disability, marital status, or any other basis prohibited by law. Red Hat does not accept unsolicited resumes or CVs from recruitment agencies and is not responsible for any fees or commissions paid to such agencies. We support individuals with disabilities and provide reasonable accommodations to applicants. For assistance with the online application, email application‑assistance@redhat.com.
Important Notice Please mention the word
PROGRESSIVE
and tag
RNzQuMjA4LjYxLjE0Mw==
when applying to show you read the job post completely (#RNzQuMjA4LjYxLjE0Mw==). This is a beta feature to avoid spam applicants. Companies can search these words to find applicants that read this and see they are human.
#J-18808-Ljbffr
Responsibilities
Orchestrate Distributed Inference: deploy and configure LLM‑D and vLLM on Kubernetes clusters, including advanced deployment strategies such as disaggregated serving, KV‑cache aware routing, and KV Cache offloading.
Optimize for Production: run performance benchmarks, tune vLLM parameters, and configure intelligent inference routing policies to meet SLOs for latency and throughput, focusing on Time Per Output Token (TPOT), GPU utilization, networking, and scheduler efficiency.
Code Side‑by‑Side: write production‑quality code (Python/Go/YAML) that integrates the inference engine into customers’ Kubernetes ecosystems.
Solve the “Unsolvable”: debug complex interactions between model architectures (e.g., MoE, large context windows), hardware accelerators (NVIDIA GPUs, AMD GPUs, TPUs), and Kubernetes networking (Envoy/ISTIO).
Feedback Loop: serve as “Customer Zero,” channeling field learnings back to product development and influencing the roadmap for LLM‑D and vLLM features.
Travel as needed to customers to present, demo, or help execute proof‑of‑concepts.
Qualifications
8+ years of engineering experience in Backend Systems, SRE, or Infrastructure Engineering.
Customer fluency: speak both “Systems Engineering” and “Business Value.”
Bias for action with rapid prototyping and iteration over theoretical perfection.
Deep Kubernetes expertise: fluent in K8s primitives, CRDs, operators, controllers, Gateway API ingress, stateful workloads, high‑performance networking, scheduler tuning for GPU workloads, and troubleshooting complex CNI failures.
AI inference proficiency: understand LLM forward pass, KV caching, prefill/decode disaggregation, context length impact, continuous batching in vLLM.
Systems programming: proficiency in Python (model interfaces) and Go (Kubernetes controllers/scheduler logic).
Infrastructure‑as‑code experience with Helm, Terraform, or similar tools.
Cloud & GPU hardware fluency: spin up clusters and deploy LLMs on bare‑metal and hyperscaler Kubernetes clusters.
Preferred: contributions to open‑source AI infrastructure projects (e.g., KServe, vLLM, Kubernetes); knowledge of Envoy Proxy or Inference Gateway (IGW); familiarity with model optimization techniques like Quantization (AWQ, GPTQ) and Speculative Decoding.
Salary Range $189,600.00 – $312,730.00. Actual offer will be based on qualifications. Pay transparency applies; for Remote‑US locations the range may differ but remains commensurate with duties and experience.
Benefits
Comprehensive medical, dental, and vision coverage
Flexible Spending Account—healthcare and dependent care
Health Savings Account—high deductible medical plan
Retirement 401(k) with employer match
Paid time off and holidays
Paid parental leave plans for all new parents
Leave benefits including disability, paid family medical leave, and paid military leave
Additional benefits: employee stock purchase plan, family planning reimbursement, tuition reimbursement, transportation expense account, employee assistance program, and more
About Red Hat Red Hat is the world’s leading provider of enterprise open‑source software solutions, delivering high‑performing Linux, cloud, container, and Kubernetes technologies. Our associates work flexibly across in‑office, office‑flex, and fully remote environments worldwide. We harness an open and inclusive culture to empower ideas from all backgrounds, drive innovation, and build collaborative solutions.
Equal Opportunity Policy (EEO) Red Hat is proud to be an equal‑opportunity workplace and an affirmative action employer. We review applications without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, ancestry, citizenship, age, veteran status, genetic information, disability, marital status, or any other basis prohibited by law. Red Hat does not accept unsolicited resumes or CVs from recruitment agencies and is not responsible for any fees or commissions paid to such agencies. We support individuals with disabilities and provide reasonable accommodations to applicants. For assistance with the online application, email application‑assistance@redhat.com.
Important Notice Please mention the word
PROGRESSIVE
and tag
RNzQuMjA4LjYxLjE0Mw==
when applying to show you read the job post completely (#RNzQuMjA4LjYxLjE0Mw==). This is a beta feature to avoid spam applicants. Companies can search these words to find applicants that read this and see they are human.
#J-18808-Ljbffr