Amadeus Search
Member of the Technical Staff- LLMs
Amadeus Search, San Francisco, California, United States, 94199
Member of Technical Staff – Infrastructure & LLMs
Location:
San Francisco, CA (Hybrid)
Compensation:
$170,000 – $220,000 base + 1–3% equity
Work Authorization:
U.S. work authorization required (no visa sponsorship)
Start Date:
ASAP
Type:
Full-time
About the Role We’re seeking a deeply curious and technically strong engineer to join a lean, high-performance team building next-generation inference infrastructure for LLMs. This is an opportunity to own the design and development of performance-critical systems from day one, working directly on problems like:
Scaling multi-GPU inference workloads
Designing distributed job schedulers
Experimenting with LLM distillation and optimization frameworks
You’ll join a two-person engineering team at the earliest stage, where your impact will be foundational to both product and culture. No bureaucracy. No politics. Just ambitious, technically challenging work that matters.
Why This Role is Unique
Massive Technical Ownership:
Drive core infra design with zero red tape.
Frontier Engineering:
Work on distributed systems, LLM runtimes, CUDA orchestration, and novel scaling solutions.
Foundational Equity:
Earn meaningful ownership and grow into a founding-level role.
Mission-Driven:
Focused on durable infra, not short-term hype cycles.
No Credentials Needed:
We value ability and drive over resumes and degrees.
Ideal Candidate Profile
2+ years experience in backend or infrastructure engineering
Deep interest or experience in distributed systems, GPU orchestration, or AI infra
Strong technical curiosity demonstrated through side projects, OSS contributions, or community involvement
Background at infra-focused orgs (e.g., Supabase, Dagster, Modal, Lightning AI, MotherDuck)
Python fluency, with production experience in Docker, GPU workloads, and distributed compute systems
Tech Stack
Core Language:
Python
Infrastructure:
Custom distributed systems for multi-GPU inference
Deployment:
Docker, CUDA, Kubernetes (or equivalent)
Focus:
Batch inference, model distillation, low‑latency pipelines
Soft Traits
Fast learner with ownership mindset
Thinks from first principles, skeptical of default assumptions
Collaborative, positive‑sum team player
Oriented toward building, not credentialism
#J-18808-Ljbffr
San Francisco, CA (Hybrid)
Compensation:
$170,000 – $220,000 base + 1–3% equity
Work Authorization:
U.S. work authorization required (no visa sponsorship)
Start Date:
ASAP
Type:
Full-time
About the Role We’re seeking a deeply curious and technically strong engineer to join a lean, high-performance team building next-generation inference infrastructure for LLMs. This is an opportunity to own the design and development of performance-critical systems from day one, working directly on problems like:
Scaling multi-GPU inference workloads
Designing distributed job schedulers
Experimenting with LLM distillation and optimization frameworks
You’ll join a two-person engineering team at the earliest stage, where your impact will be foundational to both product and culture. No bureaucracy. No politics. Just ambitious, technically challenging work that matters.
Why This Role is Unique
Massive Technical Ownership:
Drive core infra design with zero red tape.
Frontier Engineering:
Work on distributed systems, LLM runtimes, CUDA orchestration, and novel scaling solutions.
Foundational Equity:
Earn meaningful ownership and grow into a founding-level role.
Mission-Driven:
Focused on durable infra, not short-term hype cycles.
No Credentials Needed:
We value ability and drive over resumes and degrees.
Ideal Candidate Profile
2+ years experience in backend or infrastructure engineering
Deep interest or experience in distributed systems, GPU orchestration, or AI infra
Strong technical curiosity demonstrated through side projects, OSS contributions, or community involvement
Background at infra-focused orgs (e.g., Supabase, Dagster, Modal, Lightning AI, MotherDuck)
Python fluency, with production experience in Docker, GPU workloads, and distributed compute systems
Tech Stack
Core Language:
Python
Infrastructure:
Custom distributed systems for multi-GPU inference
Deployment:
Docker, CUDA, Kubernetes (or equivalent)
Focus:
Batch inference, model distillation, low‑latency pipelines
Soft Traits
Fast learner with ownership mindset
Thinks from first principles, skeptical of default assumptions
Collaborative, positive‑sum team player
Oriented toward building, not credentialism
#J-18808-Ljbffr