DRC Systems

SRE Engineer / DevOps SRE Engineer

DRC Systems, Dallas, Texas, United States, 75215

Job Description Site Reliability Engineer (SRE) role bridges software engineering and systems administration. Beyond ensuring the reliability and performance of platforms, the role also focuses on working with Development and Architecture teams to address quality gates, foundational architecture and stack components, metrics, trackers, baselines, and automated operations. Location: Dallas, Texas (Hybrid). Duration: Full‑time. Experience requirement: 10+ years.

Key Responsibilities

Automation: Automate tasks (scripts, triggers, workflow automations) for deployment, monitoring, and incident response to improve efficiency.

Monitoring and Observability: Design instrumentation, identify KPIs/metrics and events/alerting to track system health and preempt issues.

Incident Response: Respond to and resolve incidents exceeding L1/L2 thresholds, coordinate with L3 teams, minimize downtime, and follow up on problem backlogs and shift‑left initiatives.

Infrastructure as Code: Use Terraform, Ansible, or similar tools to manage infrastructure as code for repeatable, scalable deployments.

Collaboration: Work closely with architecture, development, QA, testing, and operations teams to understand system requirements and enhance overall resilience.

Problem‑Solving: Apply strong analytical skills to diagnose and resolve complex issues.

Communication: Translate technical details into actionable insights for both technical and non‑technical stakeholders.

Soft Skills: Demonstrate teamwork, time management, and proactive problem identification.

Technical Skills

Programming: Python, Java, C/C++, or Ruby, and IaC languages (Ansible, Terraform, Cloud‑Native).

Cloud Platforms: AWS, Azure, or GCP.

Containerization: Docker and Kubernetes.

Networking and System Administration.

CI/CD: Jenkins, Harness, or Spinnaker.

Qualifications

10+ years of relevant experience.

Mid‑Senior level; Full‑time commitment.

#J-18808-Ljbffr