Logo
StackAI

AI Infrastructure Engineer

StackAI, San Francisco, California, United States, 94199

Save Job

Overview

Join to apply for the

AI Infrastructure Engineer

role at

StackAI . We’re hiring an

AI Infrastructure Engineer

to shape and scale the backend systems that power our AI platform. As a Series A company, your work will be foundational, enabling safe, efficient, and reliable AI workflows from end to end. Responsibilities

Design and implement scalable backend architectures for AI workloads (inference, orchestration, monitoring). Own distributed job orchestration with Temporal and related systems. Improve data pipeline performance by designing smarter caching strategies (e.g., file deduplication, hot/cold storage, Redis caching layers) to reduce redundant compute and API calls. Build observability, monitoring, retries, and fault tolerance into all workflows. Manage infrastructure reliability, incident response, and performance. Develop tooling and platform infrastructure to support rapid growth. Partner with ML engineers to bring models to production at scale. Qualifications

4+ years of backend engineering (Python is a must). Strong background in distributed systems, job orchestration, and task queues. Deep knowledge of concurrency, parallelism, and multithreading—including async/await, event loops, thread pools, synchronization primitives, deadlocks, and race conditions. Hands-on experience with Temporal, Redis, Airflow, Celery, RabbitMQ (or similar). Experience with LLM serving and routing fundamentals (rate limiting, streaming, load balancing, budgets). Comfortable with containers & orchestration: Docker, Kubernetes. Familiarity with cloud platforms (AWS/GCP) and IaC (Terraform). Experience with multiple storage systems: S3, Postgres, MongoDB, Redis, and Elasticsearch. Track record scaling systems in startups or fast-paced environments. Understanding of deploying, monitoring, and optimizing AI/ML systems in production with strong CI/CD practices. Why You’ll Love Working Here

Play a foundational role at a fast-growing Series A startup that is shaping the future of AI in enterprise workflows. Collaborate across Product, ML, and Platform teams, bridging AI logic and scalable execution. Build infrastructure that enables real value for large enterprises: low-code, secure, and scalable AI workflows. Join a company that values thoughtful scaling and developer experience.

#J-18808-Ljbffr