Logo
Jobs via Dice

Site Reliability Engineer

Jobs via Dice, Ridley Park, Pennsylvania, United States

Save Job

Site Reliability Engineer (Contract) Join to apply for the

Site Reliability Engineer

role at

Jobs via Dice

2 days ago, be among the first 25 applicants.

Description This posting is for a contract assignment and is not a full-time employment offer with Boeing. Candidates selected for roles will be employed as contract workers through a Boeing approved 3rd party for the duration of the specified project.

Experienced DevOps/Site Reliability Engineer

Responsibilities

Maintain and improve the reliability, availability, and performance of production services, focusing on reducing incident frequency and recovery/restoration time.

Design, implement, and operate monitoring, alerting, logging, and tracing solutions to provide end-to-end visibility of systems and dependencies.

Respond to and resolve production incidents, participate in post‑incident reviews, and help implement corrective actions.

Build and maintain runbooks, standard operating procedures, and automation to reduce manual toil and improve operational consistency.

Collaborate with software engineers to optimize code for reliability, scalability, and resilience, and assist with capacity planning and performance tuning.

Implement and manage CI/CD pipelines, deployment strategies, and blue/green/canary release patterns to ensure safe and rapid software delivery.

Manage infrastructure and assist with provisioning, scaling, and maintaining cloud resources.

Enforce security and compliance best practices in the production environment, including access controls, secrets management, and secure logging.

Participate in on‑call coverage, rotate responsibilities, and communicate clearly with stakeholders about status and risks.

Contribute to reliability‑related projects, tooling, and initiatives that improve platform health and developer experience.

Infrastructure reliability and resilience: regularly assess and improve the reliability of core infrastructure components, with emphasis on redundancy, fault tolerance, and scalable failover strategies.

Participate in defining disaster recovery objectives (RPO, RTO), implement capabilities (backup/restore, cross‑region failover, site failover), and conduct regular exercises to validate recovery procedures.

Ensure robust backup/restore procedures, perform regular backup validation, and protect critical data across regions and environments.

Forecast growth, model failure domains, and ensure capacity buffers and scalable architectures to withstand regional outages or component failures.

Basic Qualifications

Bachelor's degree in Computer Science, Information Technology, or a related field (or equivalent practical experience).

5‑7 years of experience in DevOps or a related field.

Strong Linux/Unix administration skills and proficiency in at least one scripting language (e.g., Python, Bash).

Experience with cloud platforms, containers, and orchestration (AWS/Azure/Google Cloud Platform, Docker/Kubernetes).

Familiarity with containerization (Docker) and container orchestration (Kubernetes).

Experience with monitoring and observability tools (Prometheus, Grafana, ELK/EFK, OpenTelemetry).

Solid understanding of incident management processes, on‑call practices, and post‑mortem analysis.

Knowledge of CI/CD concepts and tooling (e.g., Jenkins, GitHub Actions, GitLab CI) and automation scripting.

Strong problem‑solving, debugging, and communication skills; ability to work in a collaborative, cross‑functional environment.

Preferred Qualifications

Bachelor's degree in Information Technology, Computer Science or a related field, or equivalent practical experience.

ITIL/ITSM or similar service management certifications (ITIL Foundation or equivalent) environments is a plus.

Knowledge of DoD or government security requirements or other regulated environments is a plus.

1+ years of experience in the Aerospace industry.

Seniority level Mid‑Senior level

Employment type Full‑time

Job function Engineering and Information Technology

Industries Software Development

#J-18808-Ljbffr