Request Technology, LLC

Site Reliability Engineer

Request Technology, LLC, Chicago, Illinois, United States, 60290

Site Reliability Engineer Hybrid (3 days onsite, 2 days remote) full‑time. No visa sponsorship. Base pay: $150,000 – $155,000 per year, subject to skills and experience.

A prestigious company seeks a Site Reliability Engineer focused on observation, logging, and capacity planning. The role requires experience with Linux, Kubernetes/Docker, Terraform, Jenkins, Ansible, Harness, and Kafka.

Responsibilities

Collaborate with development, operations and infrastructure teams to ensure availability of services, and to work through implementation issues

Develop automation for incident response and to prevent problem recurrence

Create and enhance runbooks to respond to service outages or degradations

Assess the production readiness of services

Define and track operational metrics for production performance, reliability, scalability and availability

Architect, develop and maintain shared services and tools to improve reliability and reduce toil across the organization

Qualifications

Bachelor’s or Master’s Degrees in Computer Science, Information Systems or another related field, or equivalent work experience

Minimum of 4+ years of experience in Site Reliability Engineering / DevOps

Experience with maintaining and troubleshooting large‑scale distributed systems

Experience managing infrastructure in public cloud environments like AWS (preferred), Azure or GCP

Experience with AIOps and predictive analysis for anomaly detection, forecasting system capacity using monitoring and alerting tools like Splunk, AppDynamics, Datadog, StackDriver, Sysdig, Prometheus or Grafana

Programming/scripting experience in languages like Java, Bash, Python or Go

Experience with distributed messaging systems such as Kafka, RabbitMQ, or ActiveMQ

Experience with container orchestration systems such as Kubernetes, Mesos, Docker Swarm or Rancher

Experience with CI/CD tools such as Jenkins, Travis, Harness, Appveyor, CodeBuild or CodePipeline

Familiarity with leveraging large language models (LLMs) to automate and optimize SRE workflows, including scripting, incident report summarization, or AI workload maintenance

Seniority Level Mid‑Senior

#J-18808-Ljbffr