HireTo by Kuvaka Tech
Senior Cloud Infrastructure Engineer - VideoTech Experienced
HireTo by Kuvaka Tech, San Francisco, California, United States, 94199
Senior Cloud Infrastructure Engineer
San Francisco, CA
Base Pay Range $175,000/yr – $250,000/yr
Compensation: $175k – $250k + Competitive Equity
Experience: 5–12 Years
Full-time | On-site
About The Role We are looking for a Senior Cloud Infrastructure Engineer who thrives in fast‑paced environments and excels at building and scaling large‑scale GPU compute platforms. You will play a crucial role in architecting, developing, and operating the foundational infrastructure that powers advanced AI workloads. This role requires someone deeply technical, adaptable, and execution‑oriented—more focused on solving hard problems than matching exact tools.
What You’ll Do
Build and maintain the core Python‑based platform that handles request routing, AI workload orchestration, GPU server capacity management, observability, and more.
Develop and manage infrastructure using Terraform, Ansible, and cloud provider APIs, supporting GPU fleets across cloud and potentially bare‑metal environments.
Own and operate the platform’s foundational technologies, which may include Kubernetes (K8s), FluxCD, Nomad, Prometheus, Thanos, Grafana, Loki, distributed networking, and storage systems.
Architect and implement solutions that significantly improve the performance, scalability, and availability of services used by millions of users.
Collaborate closely with engineering teams to design and build new infrastructure systems end‑to‑end.
Drive the long‑term infrastructure roadmap (1/2/5 year planning) and influence best practices as the company scales.
Shape the technical direction of a highly ambitious engineering environment.
What We’re Looking For
5–12 years of experience as an Infrastructure Engineer, Cloud Engineer, SRE, or similar role.
Strong experience in Python, Linux, Cloud platforms (AWS preferred), Kubernetes, and distributed systems.
Hands‑on experience with IaC tools such as Terraform and automation tools like Ansible.
Experience with monitoring/observability stacks: Prometheus, Loki, Grafana, Thanos, etc.
Strong problem‑solving ability, ownership mindset, and a bias for rapid execution.
Ability to work in a small, high‑performance team solving complex infrastructure challenges.
Willingness to relocate to San Francisco—although remote work is possible, in‑person collaboration is preferred.
Tech Stack Python, Kubernetes, Terraform, Ansible, AWS, Prometheus, Grafana, Loki, Thanos, Linux
Interview Process
Recruiter Screen
Introductory Call with Leadership
Technical Phone Interview
Additional Leadership Conversation
Onsite Technical Interview
Take‑Home Project
Reference Checks
Seniority level: Mid‑Senior
Employment type: Full‑time
Job function: Information Technology
#J-18808-Ljbffr
Base Pay Range $175,000/yr – $250,000/yr
Compensation: $175k – $250k + Competitive Equity
Experience: 5–12 Years
Full-time | On-site
About The Role We are looking for a Senior Cloud Infrastructure Engineer who thrives in fast‑paced environments and excels at building and scaling large‑scale GPU compute platforms. You will play a crucial role in architecting, developing, and operating the foundational infrastructure that powers advanced AI workloads. This role requires someone deeply technical, adaptable, and execution‑oriented—more focused on solving hard problems than matching exact tools.
What You’ll Do
Build and maintain the core Python‑based platform that handles request routing, AI workload orchestration, GPU server capacity management, observability, and more.
Develop and manage infrastructure using Terraform, Ansible, and cloud provider APIs, supporting GPU fleets across cloud and potentially bare‑metal environments.
Own and operate the platform’s foundational technologies, which may include Kubernetes (K8s), FluxCD, Nomad, Prometheus, Thanos, Grafana, Loki, distributed networking, and storage systems.
Architect and implement solutions that significantly improve the performance, scalability, and availability of services used by millions of users.
Collaborate closely with engineering teams to design and build new infrastructure systems end‑to‑end.
Drive the long‑term infrastructure roadmap (1/2/5 year planning) and influence best practices as the company scales.
Shape the technical direction of a highly ambitious engineering environment.
What We’re Looking For
5–12 years of experience as an Infrastructure Engineer, Cloud Engineer, SRE, or similar role.
Strong experience in Python, Linux, Cloud platforms (AWS preferred), Kubernetes, and distributed systems.
Hands‑on experience with IaC tools such as Terraform and automation tools like Ansible.
Experience with monitoring/observability stacks: Prometheus, Loki, Grafana, Thanos, etc.
Strong problem‑solving ability, ownership mindset, and a bias for rapid execution.
Ability to work in a small, high‑performance team solving complex infrastructure challenges.
Willingness to relocate to San Francisco—although remote work is possible, in‑person collaboration is preferred.
Tech Stack Python, Kubernetes, Terraform, Ansible, AWS, Prometheus, Grafana, Loki, Thanos, Linux
Interview Process
Recruiter Screen
Introductory Call with Leadership
Technical Phone Interview
Additional Leadership Conversation
Onsite Technical Interview
Take‑Home Project
Reference Checks
Seniority level: Mid‑Senior
Employment type: Full‑time
Job function: Information Technology
#J-18808-Ljbffr