CACI International
Cloud Reliability Engineer
The Opportunity: We're looking for a
Cloud Reliability Engineer
to drive the design, build, and support of cloud-native infrastructure and platform services for critical Department of Defense mission systems. This isn't just about operations; it's about owning the
reliability of the product
, directly impacting user outcomes. You'll define, measure, and enhance reliability using frameworks like
Critical User Journeys (CUJs)
, collaborating with product stakeholders, DevSecOps, and cybersecurity teams. Job Responsibilities: Engineer Cloud-Native Platforms:
Design, deploy, and maintain robust
Kubernetes
clusters and supporting services across
AWS GovCloud
and
Azure
.
Drive User-Centric Reliability:
Collaborate to understand
Critical User Journeys (CUJs)
. Define and implement
product-level Service Level Objectives (SLOs)
, focusing on user-visible behaviors and outcomes (availability, latency, etc.).
Automate Everything:
Provision, configure, and monitor platforms using
Infrastructure-as-Code
(Terraform, CloudFormation) .
Enhance Observability:
Implement and leverage telemetry and request-level annotation to directly link infrastructure requests to product functionality and mission partner objectives.
Secure & Comply:
Manage identity, access, patching, logging, and backups in multi-tenant environments, integrating
RMF, Zero Trust, and IL5+ hardening
into platform design.
Troubleshoot with Impact:
Prioritize and resolve platform service and infrastructure issues based on user impact and product criticality.
Collaborate & Document:
Work within Agile teams, contribute to user objective refinement, and maintain comprehensive system documentation.
Qualifications: Required: Active TS/SCI Clearance.
Bachelor's degree in a technical field with 3+ years of relevant experience.
Deep expertise with
AWS (GovCloud/SC2S), Kubernetes (EKS or self-managed), Linux, and CI/CD tools.
Proficiency in
Bash or Python
.
Hands-on experience with
Git/GitLab, container registries, and infrastructure monitoring.
Strong understanding of cloud security, IAM, networking, and platform lifecycle management.
Proven ability to translate user needs into measurable reliability targets and implement
user-focused SLOs.
Excellent communication and troubleshooting skills, with a focus on end-user experience and product reliability.
Solid grasp of cloud networking, load balancing, and DNS.
Certifications:
CompTIA Cloud+ or Security+; GICSP, SSCP, or GSEC.
Desired: Master's degree in a technical discipline.
Experience with Air Force or DoD platform infrastructure environments (e.g., Platform One, Iron Bank, Big Bang).
Familiarity with Atlassian tools and DevSecOps workflows.
What You Can Expect: A culture of integrity. At CACI, we place character and innovation at the center of everything we do. As a valued team member, you'll be part of a high-performing group dedicated to our customer's missions and driven by a higher purpose
to ensure the safety of our nation. An environment of trust. CACI values the unique contributions that every employee brings to our company and our customers - every day. You'll have the autonomy to take the time you need through a unique flexible time off benefit and have access to robust learning resources to make your ambitions a reality. A focus on continuous growth. Together, we will advance our nation's most critical missions, build on our lengthy track record of business success, and find opportunities to break new ground
in your career and in our legacy. Your potential is limitless.
So is ours. The proposed salary range for this position is: $69,100-$141,500 CACI is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, pregnancy, sexual orientation, age, national origin, disability, status as a protected veteran, or any other protected characteristic.
The Opportunity: We're looking for a
Cloud Reliability Engineer
to drive the design, build, and support of cloud-native infrastructure and platform services for critical Department of Defense mission systems. This isn't just about operations; it's about owning the
reliability of the product
, directly impacting user outcomes. You'll define, measure, and enhance reliability using frameworks like
Critical User Journeys (CUJs)
, collaborating with product stakeholders, DevSecOps, and cybersecurity teams. Job Responsibilities: Engineer Cloud-Native Platforms:
Design, deploy, and maintain robust
Kubernetes
clusters and supporting services across
AWS GovCloud
and
Azure
.
Drive User-Centric Reliability:
Collaborate to understand
Critical User Journeys (CUJs)
. Define and implement
product-level Service Level Objectives (SLOs)
, focusing on user-visible behaviors and outcomes (availability, latency, etc.).
Automate Everything:
Provision, configure, and monitor platforms using
Infrastructure-as-Code
(Terraform, CloudFormation) .
Enhance Observability:
Implement and leverage telemetry and request-level annotation to directly link infrastructure requests to product functionality and mission partner objectives.
Secure & Comply:
Manage identity, access, patching, logging, and backups in multi-tenant environments, integrating
RMF, Zero Trust, and IL5+ hardening
into platform design.
Troubleshoot with Impact:
Prioritize and resolve platform service and infrastructure issues based on user impact and product criticality.
Collaborate & Document:
Work within Agile teams, contribute to user objective refinement, and maintain comprehensive system documentation.
Qualifications: Required: Active TS/SCI Clearance.
Bachelor's degree in a technical field with 3+ years of relevant experience.
Deep expertise with
AWS (GovCloud/SC2S), Kubernetes (EKS or self-managed), Linux, and CI/CD tools.
Proficiency in
Bash or Python
.
Hands-on experience with
Git/GitLab, container registries, and infrastructure monitoring.
Strong understanding of cloud security, IAM, networking, and platform lifecycle management.
Proven ability to translate user needs into measurable reliability targets and implement
user-focused SLOs.
Excellent communication and troubleshooting skills, with a focus on end-user experience and product reliability.
Solid grasp of cloud networking, load balancing, and DNS.
Certifications:
CompTIA Cloud+ or Security+; GICSP, SSCP, or GSEC.
Desired: Master's degree in a technical discipline.
Experience with Air Force or DoD platform infrastructure environments (e.g., Platform One, Iron Bank, Big Bang).
Familiarity with Atlassian tools and DevSecOps workflows.
What You Can Expect: A culture of integrity. At CACI, we place character and innovation at the center of everything we do. As a valued team member, you'll be part of a high-performing group dedicated to our customer's missions and driven by a higher purpose
to ensure the safety of our nation. An environment of trust. CACI values the unique contributions that every employee brings to our company and our customers - every day. You'll have the autonomy to take the time you need through a unique flexible time off benefit and have access to robust learning resources to make your ambitions a reality. A focus on continuous growth. Together, we will advance our nation's most critical missions, build on our lengthy track record of business success, and find opportunities to break new ground
in your career and in our legacy. Your potential is limitless.
So is ours. The proposed salary range for this position is: $69,100-$141,500 CACI is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, pregnancy, sexual orientation, age, national origin, disability, status as a protected veteran, or any other protected characteristic.