Varda Space Industries
Principal Site Reliability Engineer
Varda Space Industries, El Segundo, California, United States, 90245
About Varda
Low Earth orbit is open for business. Varda is accelerating the development of commercial space infrastructure, from in-orbit pharmaceutical processing to reliable and economical reentry capsules. Our mission is to expand the economic bounds of humankind. Our team includes veterans from SpaceX, Blue Origin, major pharmaceutical companies, and Silicon Valley. Varda was founded in January 2021 by Will Bruey and Delian Asparouhov with backing from investors including Khosla Ventures, Lux Capital, Founders Fund, Caffeinated Capital, General Catalyst, and Also Capital. Varda is headquartered in El Segundo, California, with offices in Washington, DC and Huntsville, AL (coming soon). Join Varda, and work to create a bustling in-space ecosystem. About This Role As a Principal Site Reliability Engineer, you will help set the technical vision and strategy for reliability across spacecraft, ground systems, and enterprise platforms. You’ll define standards, mentor senior engineers, and drive cross-organizational initiatives to ensure systems are highly operable, secure, and mission-ready. This role combines deep technical expertise with the ability to influence architectural direction at the company level. Responsibilities
Lead and contribute hands-on to the deployment, maintenance, and operations of mission-critical applications and infrastructure supporting spacecraft, ground systems, and company-wide platforms. Design, execute, and manage highly scalable, reliable, and operable software and infrastructure platforms, applying Infrastructure as Code (IaC) principles to drive automation, consistency, and repeatability across Kubernetes environments. Collaborate closely with software and hardware teams to align reliability best practices, CI/CD pipelines, and compliance with their workflows, enabling faster, more secure deployments for mission-critical systems. Anticipate and address reliability risks, capacity challenges, and performance bottlenecks; develop long-term strategies in partnership with leadership. Rotate through the team’s on-call schedule to keep critical systems healthy and responsive. Occasionally travel to customer sites and other Varda locations to troubleshoot, deploy, or test critical infrastructure. Basic Qualifications
10+ years of experience in SRE, DevOps, or systems engineering, including leadership of large-scale, mission-critical systems. Experience leading technical direction and architecture for large-scale systems Hands-on experience with observability stacks and telemetry pipelines—including metrics collection, alerting, and dashboards—for Linux systems and Kubernetes workloads (e.g., Prometheus and Grafana). Strong background in systems architecture and software-defined networking (VPC, subnets, firewalls, VPNs, etc.). Proficiency in automation and scripting with Python, Bash, or similar languages Positive and strong communication skills, both written and oral Preferred Skills and Experience
Expertise in time-series databases (e.g., InfluxDB) for large-scale telemetry pipeline. Expertise in provisioning and managing scalable Azure cloud infrastructure using native tools and best practices (Azure GCC High preferred). Experience with IaC tools like Terraform, and Ansible and CI/CD systems like Git and ArgoCD Experience building and maintaining dynamic system configurations with templating frameworks such as YAML, and Helm. Strong understanding of Linux systems, containerization technologies, and Kubernetes internals Additional Requirements
Must be physically able to regularly lift 25 lbs. for duties such as delivering computers, unpacking and rack-mounting equipment, etc. Pay Range
Senior Site Reliability Engineer:
153,000.00 - $190,000.00 per year Leveling and base salary is determined by job-related skills, education level, experience level, and job performance You will be eligible for long-term incentives in the form of stock options and/or long-term cash awards Varda is an Equal Opportunity Employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. Candidates are evaluated based on merit, qualifications, and performance. We will not discriminate on race, color, gender, national origin, ethnicity, veteran status, disability status, age, sexual orientation, gender identity, or other legally protected status. E-Verify
statement: Varda Space Industries participates in the U.S. Department of Homeland Security E-Verify program to verify employment eligibility. Learn more about the E-Verify program. Interested in building your career at Varda Space Industries? Get future opportunities sent straight to your email.
#J-18808-Ljbffr
Low Earth orbit is open for business. Varda is accelerating the development of commercial space infrastructure, from in-orbit pharmaceutical processing to reliable and economical reentry capsules. Our mission is to expand the economic bounds of humankind. Our team includes veterans from SpaceX, Blue Origin, major pharmaceutical companies, and Silicon Valley. Varda was founded in January 2021 by Will Bruey and Delian Asparouhov with backing from investors including Khosla Ventures, Lux Capital, Founders Fund, Caffeinated Capital, General Catalyst, and Also Capital. Varda is headquartered in El Segundo, California, with offices in Washington, DC and Huntsville, AL (coming soon). Join Varda, and work to create a bustling in-space ecosystem. About This Role As a Principal Site Reliability Engineer, you will help set the technical vision and strategy for reliability across spacecraft, ground systems, and enterprise platforms. You’ll define standards, mentor senior engineers, and drive cross-organizational initiatives to ensure systems are highly operable, secure, and mission-ready. This role combines deep technical expertise with the ability to influence architectural direction at the company level. Responsibilities
Lead and contribute hands-on to the deployment, maintenance, and operations of mission-critical applications and infrastructure supporting spacecraft, ground systems, and company-wide platforms. Design, execute, and manage highly scalable, reliable, and operable software and infrastructure platforms, applying Infrastructure as Code (IaC) principles to drive automation, consistency, and repeatability across Kubernetes environments. Collaborate closely with software and hardware teams to align reliability best practices, CI/CD pipelines, and compliance with their workflows, enabling faster, more secure deployments for mission-critical systems. Anticipate and address reliability risks, capacity challenges, and performance bottlenecks; develop long-term strategies in partnership with leadership. Rotate through the team’s on-call schedule to keep critical systems healthy and responsive. Occasionally travel to customer sites and other Varda locations to troubleshoot, deploy, or test critical infrastructure. Basic Qualifications
10+ years of experience in SRE, DevOps, or systems engineering, including leadership of large-scale, mission-critical systems. Experience leading technical direction and architecture for large-scale systems Hands-on experience with observability stacks and telemetry pipelines—including metrics collection, alerting, and dashboards—for Linux systems and Kubernetes workloads (e.g., Prometheus and Grafana). Strong background in systems architecture and software-defined networking (VPC, subnets, firewalls, VPNs, etc.). Proficiency in automation and scripting with Python, Bash, or similar languages Positive and strong communication skills, both written and oral Preferred Skills and Experience
Expertise in time-series databases (e.g., InfluxDB) for large-scale telemetry pipeline. Expertise in provisioning and managing scalable Azure cloud infrastructure using native tools and best practices (Azure GCC High preferred). Experience with IaC tools like Terraform, and Ansible and CI/CD systems like Git and ArgoCD Experience building and maintaining dynamic system configurations with templating frameworks such as YAML, and Helm. Strong understanding of Linux systems, containerization technologies, and Kubernetes internals Additional Requirements
Must be physically able to regularly lift 25 lbs. for duties such as delivering computers, unpacking and rack-mounting equipment, etc. Pay Range
Senior Site Reliability Engineer:
153,000.00 - $190,000.00 per year Leveling and base salary is determined by job-related skills, education level, experience level, and job performance You will be eligible for long-term incentives in the form of stock options and/or long-term cash awards Varda is an Equal Opportunity Employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. Candidates are evaluated based on merit, qualifications, and performance. We will not discriminate on race, color, gender, national origin, ethnicity, veteran status, disability status, age, sexual orientation, gender identity, or other legally protected status. E-Verify
statement: Varda Space Industries participates in the U.S. Department of Homeland Security E-Verify program to verify employment eligibility. Learn more about the E-Verify program. Interested in building your career at Varda Space Industries? Get future opportunities sent straight to your email.
#J-18808-Ljbffr