Care Dynamics Inc

Founding LLM Inference Engineer (replacement search, exclusive)

Care Dynamics Inc, San Francisco, California, United States, 94199

Founding LLM Inference Engineer

Full-time | On-site | San Francisco, CA

Compensation: $200K - $300K + 0.10%-1.00% Equity

About the Role

We're looking for a Founding LLM Inference Engineer to architect and optimize large-scale inference systems powering cutting-edge AI applications. You'll be building the backbone of an AI platform used by top enterprises, with a focus on performance, scalability, and reliability.

This is a hands-on, high-impact role where you'll collaborate closely with research and product teams, moving fast to bring breakthrough model capabilities into production. If you're excited about low-latency systems, high-throughput pipelines, and deploying bleeding-edge LLMs, this role is for you.

Tech stack:

Python, CUDA, LLMs, API integrations, TGI, vLLM, TensorRT-LLM

What You'll Do

Architect and implement scalable inference systems for state-of-the-art models Optimize infrastructure for high throughput and low latency at scale Develop and integrate advanced inference optimization techniques Collaborate with research teams to productionize new model capabilities Build developer tools and infra to support rapid experimentation and deployment What We're Looking For

Deep expertise in LLM inference, optimization, and deployment at scale Strong background in Python and GPU programming (CUDA) Experience with serving frameworks (TGI, vLLM, TensorRT-LLM) Proven track record of shipping production-grade AI systems Excitement about building foundational infra at an early-stage AI startup Benefits

Competitive salary + equity (0.10%-1.00%) Health, dental, and vision insurance Daily team lunches and wellness stipend Unlimited PTO + flexible parental leave On-site role in San Francisco (5 days a week)

Ready to take the next step?

Apply now or email Jenn at Recruiter@CareDynamicsFL.com to learn more.