Zettabyte
Why this role exists
We’re looking for an
Operations Engineer
to help us design, build, and maintain the infrastructure powering our 10,000+ GPU cloud platform. You’ll be responsible for keeping our systems highly available, secure, and performant while working closely with backend, frontend, and infrastructure teams to enable rapid development and deployment. As part of the operations team, you’ll lead efforts in
automation, monitoring, scaling, and reliability engineering
to support our fast-growing user base and platform demands. This is an ideal opportunity for someone excited to take ownership, drive large scale deployment, move fast, and shape the foundation of a high-impact AI startup. What you’ll do
Design, build, and maintain scalable infrastructure across multi-cloud and on-prem GPU environments.
Develop automation scripts and tools for provisioning, monitoring, and managing systems.
Implement robust CI/CD pipelines to support rapid development and deployment cycles.
Monitor and improve system performance, reliability, and security.
Troubleshoot infrastructure issues and respond to incidents with a focus on root cause analysis.
Collaborate with engineering teams to ensure seamless integration of backend systems and APIs.
Leverage
AI-assisted coding and DevOps tools
(e.g., GitHub Copilot, ChatGPT, Cursor IDE) to accelerate operations workflows and increase reliability.
You’ll thrive here if you
3+ years of experience as a DevOps, SRE, or Operations Engineer in production environments.
Proficiency with
cloud platforms
(AWS, GCP, Azure) and infrastructure-as-code tools (Terraform, Ansible, Pulumi, etc.).
Strong experience with
containerization and orchestration
(Docker, Kubernetes).
Knowledge of networking, security, and distributed systems.
Experience building and maintaining CI/CD pipelines (Jenkins, GitHub Actions, GitLab CI, etc.).
Familiarity with monitoring and observability tools (Prometheus, Grafana, ELK stack, etc.).
Experience using
AI-assisted coding tools
(e.g., Copilot, ChatGPT) and openness to integrating them into daily workflows.
Startup mindset: self-motivated, comfortable with ambiguity, and excited to wear multiple hats.
Compensation
Competitive salary
— commensurate with your experience and aligned with industry standards
Meaningful equity
— be part of the upside as we build a category-defining company. Your grant will align with your role and the experience you bring.
#J-18808-Ljbffr
We’re looking for an
Operations Engineer
to help us design, build, and maintain the infrastructure powering our 10,000+ GPU cloud platform. You’ll be responsible for keeping our systems highly available, secure, and performant while working closely with backend, frontend, and infrastructure teams to enable rapid development and deployment. As part of the operations team, you’ll lead efforts in
automation, monitoring, scaling, and reliability engineering
to support our fast-growing user base and platform demands. This is an ideal opportunity for someone excited to take ownership, drive large scale deployment, move fast, and shape the foundation of a high-impact AI startup. What you’ll do
Design, build, and maintain scalable infrastructure across multi-cloud and on-prem GPU environments.
Develop automation scripts and tools for provisioning, monitoring, and managing systems.
Implement robust CI/CD pipelines to support rapid development and deployment cycles.
Monitor and improve system performance, reliability, and security.
Troubleshoot infrastructure issues and respond to incidents with a focus on root cause analysis.
Collaborate with engineering teams to ensure seamless integration of backend systems and APIs.
Leverage
AI-assisted coding and DevOps tools
(e.g., GitHub Copilot, ChatGPT, Cursor IDE) to accelerate operations workflows and increase reliability.
You’ll thrive here if you
3+ years of experience as a DevOps, SRE, or Operations Engineer in production environments.
Proficiency with
cloud platforms
(AWS, GCP, Azure) and infrastructure-as-code tools (Terraform, Ansible, Pulumi, etc.).
Strong experience with
containerization and orchestration
(Docker, Kubernetes).
Knowledge of networking, security, and distributed systems.
Experience building and maintaining CI/CD pipelines (Jenkins, GitHub Actions, GitLab CI, etc.).
Familiarity with monitoring and observability tools (Prometheus, Grafana, ELK stack, etc.).
Experience using
AI-assisted coding tools
(e.g., Copilot, ChatGPT) and openness to integrating them into daily workflows.
Startup mindset: self-motivated, comfortable with ambiguity, and excited to wear multiple hats.
Compensation
Competitive salary
— commensurate with your experience and aligned with industry standards
Meaningful equity
— be part of the upside as we build a category-defining company. Your grant will align with your role and the experience you bring.
#J-18808-Ljbffr