Bits to Atoms
Join to apply for the
Site Reliability Engineer
role at
Bits to Atoms The
Site Reliability Engineer (SRE)
will ensure the reliability, scalability, and performance of a hybrid (cloud + on-prem) platform. You’ll work at the intersection of infrastructure, AI/ML systems, and mission-critical physical operations. You’ll collaborate directly with engineering, AI, and operations teams to design resilient systems and bring cutting-edge AI models into production. This is a high-impact role — your work will directly shape how the world’s most advanced data centers operate. Company Overview Bits to Atoms has partnered with Fluix AI to fill its Site Reliability Engineer role. Fluix is building the AI operating system that plans, designs, and optimizes AI infrastructure. Based in Silicon Valley, Fluix specializes in AI-driven solutions for data centers and power providers, leveraging machine learning and AI technologies. Our mission is bold: to double America’s compute capacity without building new data centers. Position Overview The Site Reliability Engineer will ensure reliability, scalability, and performance across cloud and on-prem environments. You’ll work at the intersection of infrastructure, AI/ML systems, and mission-critical physical operations. You’ll collaborate with engineering, AI, and operations teams to design resilient systems and deploy AI models into production. Who You’ll Work With Chase Overcash
– CTO Responsibilities Design, implement, and maintain scalable, fault-tolerant infrastructure across cloud and on-prem environments. Build automation to streamline operations, reduce toil, and increase reliability. Integrate ML/AI models into production systems and optimize their performance at scale. Improve system resilience through monitoring, observability, and incident management. Lead post-incident reviews, drive root-cause analysis, and implement preventative fixes. Manage multi-environment cloud setups (dev, staging, prod) and optimize data center operations. Ensure compliance and security across all infrastructure and applications. Partner with engineering and data science teams to continuously improve deployment practices. Qualifications Bachelor’s degree in Computer Science, Engineering, or equivalent experience. Proven experience as an SRE, DevOps engineer, or similar role in a SaaS or infrastructure-heavy environment. Strong expertise with cloud platforms (AWS preferred; GCP/Azure also valuable). Proficiency in Python or similar scripting/programming languages. Hands-on experience with containerization and orchestration (Kubernetes). Solid understanding of networking, security, and performance optimization. Familiarity with ML/AI infrastructure and data center operations is a strong plus. Experience with CI/CD pipelines and infrastructure-as-code (Terraform, Ansible, etc.). Excellent problem-solving skills and the ability to thrive in a fast-paced startup environment. Culture Fit Obsessed with solving hard problems and willing to dig deep. Hands-on, comfortable with both physical and software systems. Value being on-site and with clients, understanding impact of mission-critical work. Embrace flexibility — supporting teammates during weekends, holidays, or emergencies when needed. Over-communicate, collaborate openly, and take ownership. Why Fluix? Competitive salary and equity package. Comprehensive health, dental, and vision insurance. Opportunities to shape the future of AI infrastructure and data center technology. A collaborative, fast-paced environment in the San Francisco Bay Area. Referrals increase your chances of interviewing at Bits to Atoms. Get notified about new Site Reliability Engineer jobs in
San Francisco Bay Area .
#J-18808-Ljbffr
Site Reliability Engineer
role at
Bits to Atoms The
Site Reliability Engineer (SRE)
will ensure the reliability, scalability, and performance of a hybrid (cloud + on-prem) platform. You’ll work at the intersection of infrastructure, AI/ML systems, and mission-critical physical operations. You’ll collaborate directly with engineering, AI, and operations teams to design resilient systems and bring cutting-edge AI models into production. This is a high-impact role — your work will directly shape how the world’s most advanced data centers operate. Company Overview Bits to Atoms has partnered with Fluix AI to fill its Site Reliability Engineer role. Fluix is building the AI operating system that plans, designs, and optimizes AI infrastructure. Based in Silicon Valley, Fluix specializes in AI-driven solutions for data centers and power providers, leveraging machine learning and AI technologies. Our mission is bold: to double America’s compute capacity without building new data centers. Position Overview The Site Reliability Engineer will ensure reliability, scalability, and performance across cloud and on-prem environments. You’ll work at the intersection of infrastructure, AI/ML systems, and mission-critical physical operations. You’ll collaborate with engineering, AI, and operations teams to design resilient systems and deploy AI models into production. Who You’ll Work With Chase Overcash
– CTO Responsibilities Design, implement, and maintain scalable, fault-tolerant infrastructure across cloud and on-prem environments. Build automation to streamline operations, reduce toil, and increase reliability. Integrate ML/AI models into production systems and optimize their performance at scale. Improve system resilience through monitoring, observability, and incident management. Lead post-incident reviews, drive root-cause analysis, and implement preventative fixes. Manage multi-environment cloud setups (dev, staging, prod) and optimize data center operations. Ensure compliance and security across all infrastructure and applications. Partner with engineering and data science teams to continuously improve deployment practices. Qualifications Bachelor’s degree in Computer Science, Engineering, or equivalent experience. Proven experience as an SRE, DevOps engineer, or similar role in a SaaS or infrastructure-heavy environment. Strong expertise with cloud platforms (AWS preferred; GCP/Azure also valuable). Proficiency in Python or similar scripting/programming languages. Hands-on experience with containerization and orchestration (Kubernetes). Solid understanding of networking, security, and performance optimization. Familiarity with ML/AI infrastructure and data center operations is a strong plus. Experience with CI/CD pipelines and infrastructure-as-code (Terraform, Ansible, etc.). Excellent problem-solving skills and the ability to thrive in a fast-paced startup environment. Culture Fit Obsessed with solving hard problems and willing to dig deep. Hands-on, comfortable with both physical and software systems. Value being on-site and with clients, understanding impact of mission-critical work. Embrace flexibility — supporting teammates during weekends, holidays, or emergencies when needed. Over-communicate, collaborate openly, and take ownership. Why Fluix? Competitive salary and equity package. Comprehensive health, dental, and vision insurance. Opportunities to shape the future of AI infrastructure and data center technology. A collaborative, fast-paced environment in the San Francisco Bay Area. Referrals increase your chances of interviewing at Bits to Atoms. Get notified about new Site Reliability Engineer jobs in
San Francisco Bay Area .
#J-18808-Ljbffr