TensorWave
Kubernetes Platform Engineer
At TensorWave, we're leading the charge in AI compute, building a versatile cloud platform that's driving the next generation of AI innovation. We're focused on creating a foundation that empowers cutting-edge advancements in intelligent computing, pushing the boundaries of what's possible in the AI landscape. About the Role: As a Kubernetes Platform Engineer focused on support and operations, you'll play a critical role in maintaining the stability and reliability of our bare-metal Kubernetes infrastructure. You will work closely with senior engineers, taking point on troubleshooting, incident response, and day-to-day cluster operations across multi-tenant workloads. This is a great opportunity for engineers ready to deepen their Kubernetes expertise while supporting cutting-edge AI environments in real-time. Responsibilities: Own and troubleshoot operational issues within Kubernetes environments Maintain and monitor core services (e.g., Cilium, HAProxy, Prometheus, etc.) Ensure uptime, performance, and reliability of multi-tenant clusters Assist with Ingress/Egress connectivity and network debugging Support internal and customer teams in secure, isolated VPC environments Collaborate with senior engineers on automation and cluster lifecycle improvements Required Skills & Experience: 24 years experience in DevOps, SRE, or Linux infrastructure roles 1+ years of hands-on experience with Kubernetes in production Familiarity with networking, CNI plugins, and core Linux troubleshooting Strong infrastructure-as-code mindset using tools like Helm, Terraform, or Ansible Solid experience with monitoring and logging tools (e.g., Prometheus, Grafana, Loki) Understanding of secure infrastructure design principles and least-privilege access Comfortable working in a team-oriented, fast-paced operational environment Nice to Have: Experience with RKE2, Rancher, or similar platforms Experience troubleshooting or supporting AI or GPU-based workloads Familiarity with HAProxy, Cilium, or other Kubernetes ingress/networking tools What We Bring: In addition to a competitive salary, we offer a variety of benefits to support your needs, including: Stock Options 100% paid Medical, Dental, and Vision insurance Life and Voluntary Supplemental Insurance Short Term Disability Insurance Flexible Spending Account 401(k) Flexible PTO Paid Holidays Parental Leave Mental Health Benefits through Spring Health
At TensorWave, we're leading the charge in AI compute, building a versatile cloud platform that's driving the next generation of AI innovation. We're focused on creating a foundation that empowers cutting-edge advancements in intelligent computing, pushing the boundaries of what's possible in the AI landscape. About the Role: As a Kubernetes Platform Engineer focused on support and operations, you'll play a critical role in maintaining the stability and reliability of our bare-metal Kubernetes infrastructure. You will work closely with senior engineers, taking point on troubleshooting, incident response, and day-to-day cluster operations across multi-tenant workloads. This is a great opportunity for engineers ready to deepen their Kubernetes expertise while supporting cutting-edge AI environments in real-time. Responsibilities: Own and troubleshoot operational issues within Kubernetes environments Maintain and monitor core services (e.g., Cilium, HAProxy, Prometheus, etc.) Ensure uptime, performance, and reliability of multi-tenant clusters Assist with Ingress/Egress connectivity and network debugging Support internal and customer teams in secure, isolated VPC environments Collaborate with senior engineers on automation and cluster lifecycle improvements Required Skills & Experience: 24 years experience in DevOps, SRE, or Linux infrastructure roles 1+ years of hands-on experience with Kubernetes in production Familiarity with networking, CNI plugins, and core Linux troubleshooting Strong infrastructure-as-code mindset using tools like Helm, Terraform, or Ansible Solid experience with monitoring and logging tools (e.g., Prometheus, Grafana, Loki) Understanding of secure infrastructure design principles and least-privilege access Comfortable working in a team-oriented, fast-paced operational environment Nice to Have: Experience with RKE2, Rancher, or similar platforms Experience troubleshooting or supporting AI or GPU-based workloads Familiarity with HAProxy, Cilium, or other Kubernetes ingress/networking tools What We Bring: In addition to a competitive salary, we offer a variety of benefits to support your needs, including: Stock Options 100% paid Medical, Dental, and Vision insurance Life and Voluntary Supplemental Insurance Short Term Disability Insurance Flexible Spending Account 401(k) Flexible PTO Paid Holidays Parental Leave Mental Health Benefits through Spring Health