Logo
Apex Systems

Senior Site Reliability Engineer

Apex Systems, Fairfax, Virginia, United States, 22032

Save Job

Senior Site Reliability Engineer – Apex Systems Our client is seeking a Senior Site Reliability Engineer to join the team building the next-generation Continuous Diagnostics and Mitigation (CDM) Cyber data solution. The CDM Program is the Cybersecurity and Infrastructure Security Agency’s dynamic approach to strengthening the cybersecurity of Federal networks and systems through better awareness and visibility into their security posture and cyber threats.

Role & Responsibilities The Senior SRE will define, implement, and grow our SRE practice to ensure the reliability, availability, and performance of critical production environments. Responsibilities include designing and maintaining resilient infrastructure, defining and measuring SLOs/SLIs, setting up comprehensive logging, monitoring, and alerting solutions using Elastic stack and other tools, responding to incidents, performing root cause analyses, and collaborating with cross-functional teams to integrate reliability and observability into the software development lifecycle.

Required Skills US citizenship with ability to obtain Public Trust Suitability 6+ years of experience as a Site Reliability Engineer or equivalent 6+ years of demonstrated experience designing, implementing, and maintaining observability solutions (logging, monitoring, alerting) 6+ years of hands‑on experience with SRE tools (Elastic, Prometheus, Grafana, Splunk, etc.) 3+ years defining and measuring SLOs and SLIs 3+ years of relevant experience using cloud platforms (AWS GovCloud preferred) 3+ years of hands‑on programming or scripting (Python, Bash, etc.) Strong knowledge of microservices, containerization, and orchestration tools (Docker, Kubernetes) Proven ability to collaborate with cross‑functional teams to integrate reliability and observability into the software development lifecycle Strong problem‑solving and analytical skills Proactive, detail‑oriented approach to identifying inefficiencies and implementing improvements

Desired Skills Bachelor’s degree in Computer Science, Engineering, or related field (or 4 additional years of related experience) Experience working in an Agile/SAFe environment using ALM tools (Jira, Confluence, or similar) Strong understanding of CI/CD principles and platforms (Jenkins, CircleCI, GitLab, GitHub Actions, Argo, Travis CI, etc.) Expertise in configuration management tools (Ansible, Puppet, Chef) Experience with infrastructure as code (Terraform, CloudFormation) In‑depth understanding of networking, security, and system administration of Linux operating systems Knowledge of version control platforms and branching strategies Knowledge of disaster recovery planning, backup strategies, and data replication Experience supporting large Federal programs ($200M+)

EEO Statement Apex Systems is an equal opportunity employer. We do not discriminate or allow discrimination on the basis of race, color, religion, creed, sex (including pregnancy, childbirth, breastfeeding, or related medical conditions), age, sexual orientation, gender identity, national origin, ancestry, citizenship, genetic information, registered domestic partner status, marital status, disability, status as a crime victim, protected veteran status, political affiliation, union membership, or any other characteristic protected by law. Apex will consider qualified applicants with criminal histories in a manner consistent with the requirements of applicable law.

#J-18808-Ljbffr