Logo
Argmax, Inc.

Cloud Systems Engineer (Staff)

Argmax, Inc., Palo Alto, California, United States, 94306

Save Job

About Argmax AI applications are scaling in user adoption at unprecedented rates. The infrastructure is crumbling: Spinner wheels are back in fashion, sensitive user data is uploaded to the cloud and occasionally leaked, spiky demand leads to infrastructure capacity crunch and waste at the same time. Argmax is building the critical infrastructure required to bring real-time AI workloads to the edge: Autoscaling instantly, private and compliant by design and reliable beyond even the multi-cloud platforms.

About the Role We are looking for a Staff Engineer to join our growing Cloud Systems team. In this role, you will design, implement and optimize systems that serve critical functions such as software licensing, large AI asset distribution, inference performance telemetry and more. Although Argmax deploys AI workloads directly on user devices, these cloud systems serve as the backbone of Argmax SDK, our flagship product, and must be built to handle traffic from millions of devices worldwide with 99.9999% uptime. If the scale and reliability challenge excites you, read further!

Responsibilities

Prepare systems for 10x scale : You will proactively identify and implement improvements to harden our existing infrastructure and ensure that they are ready for 10x higher traffic within the next year.

Architect multi-region expansion : As part of our best-in-market reliability ambition, you will lead Argmax's expansion from AWS to GCP and potentially other CSPs as we embrace Kubernetes. The primary objective will be to retain >99.9999% reliability through redundancy while maintaining cost efficiency at scale, preserving our 95%+ gross margin.

Take new systems from 0 to 1 : You will lead the design and implementation of new cloud systems to support the evolution of Argmax SDK, our flagship product. For example, Argmax SDK currently relies on third-party AI asset distribution infrastructure and one of your first projects will be to pull this infrastructure in-house and build an optimized global CDN that ensures fast and robust delivery of large AI assets worldwide.

Qualifications

3+ years of experience in designing, building and operating cloud systems that served a large cohort of users

Experience with container orchestration (Kubernetes) and cloud environments such as AWS or GCP

Fluency in one of Python, Go, or Javascript

Familiarity with Django, FastAPI or equivalent

Preferred Qualifications

Experience leading production systems serving at least one million monthly active users or handling sustained high QPS

Experience participating in compliance programs such as SOC 2, collecting the necessary evidence and communicating with independent auditorsProven success scaling systems from 0 to 1 and maintaining performance and reliability at scale

Familiarity with multi-region database replication, CDN design, and cost optimization for large-scale systems

Why Argmax

Direct ownership of mission-critical systems supporting millions of devices

No-nonsense and meritocratic culture where the career progression is only limited by how fast you make an impact on product

Top-of-market equity at a fast-growing early-stage startup with a unique mission

Performance-based equity refreshers twice a year

3 days a week in the office from Palo Alto, CA

Palo Alto office offers comprehensive on-site amenities, including chef‑catered meals

Remote possible by exception for industry leader exceptional candidates

Platinum‑tier healthcare with 90% employer contribution, including dependents

401(k) match

Quarterly in‑person team‑building weeks in Palo Alto, CA

Seniority Level Mid‑Senior level

Employment Type Full‑time

Job Function Information Technology

Industry Software Development

#J-18808-Ljbffr