Expedite Technology Solutions LLC
US_East | Infrastructure Engineer_L2
Expedite Technology Solutions LLC, Chicago, Illinois, United States, 60290
Overview
Role: Site Reliability Engineer
Work Location: Chicago, IL
Job Description
Senior-level SRE responsible for ensuring reliability, performance, and scalability of GCP-based platforms supporting a global cloud environment. Focus on automation, observability, and incident response for mission-critical applications.
Responsibilities
Advanced monitoring and observability (Prometheus, Grafana, New Relic, Datadog)
Incident management and post-mortem analysis
SLI/SLO definition and measurement
Chaos engineering and reliability testing
Performance tuning and capacity planning
Automation and scripting (Python, Go, Bash)
Infrastructure as Code (Terraform, Ansible)
Container orchestration (Kubernetes, Docker)
CI/CD pipeline design and implementation
Microservices architecture and distributed systems
Load balancing and traffic management
Database performance optimization
Compute: GCE, GKE, Cloud Run, App Engine
Monitoring: Cloud Operations Suite (Stackdriver), Cloud Logging, Cloud Monitoring
Networking: VPC, Cloud Load Balancing, Cloud CDN
Storage: Cloud Storage, Persistent Disks, Cloud SQL
Security: IAM, VPC Security, Cloud KMS
Cloud Trace and Cloud Profiler for APM
Cloud Deployment Manager and Cloud Build
Anthos for hybrid/multi-cloud management
Error Reporting and Cloud Debugger
BigQuery for log analysis and metrics
Qualifications
Minimum: 5+ years in Site Reliability Engineering or Platform Engineering
Preferred: 7+ years with enterprise-scale cloud environments
Industry: Experience in high-availability, customer-facing systems preferred
Other SubCo Staffing Center – North America
#J-18808-Ljbffr
Work Location: Chicago, IL
Job Description
Senior-level SRE responsible for ensuring reliability, performance, and scalability of GCP-based platforms supporting a global cloud environment. Focus on automation, observability, and incident response for mission-critical applications.
Responsibilities
Advanced monitoring and observability (Prometheus, Grafana, New Relic, Datadog)
Incident management and post-mortem analysis
SLI/SLO definition and measurement
Chaos engineering and reliability testing
Performance tuning and capacity planning
Automation and scripting (Python, Go, Bash)
Infrastructure as Code (Terraform, Ansible)
Container orchestration (Kubernetes, Docker)
CI/CD pipeline design and implementation
Microservices architecture and distributed systems
Load balancing and traffic management
Database performance optimization
Compute: GCE, GKE, Cloud Run, App Engine
Monitoring: Cloud Operations Suite (Stackdriver), Cloud Logging, Cloud Monitoring
Networking: VPC, Cloud Load Balancing, Cloud CDN
Storage: Cloud Storage, Persistent Disks, Cloud SQL
Security: IAM, VPC Security, Cloud KMS
Cloud Trace and Cloud Profiler for APM
Cloud Deployment Manager and Cloud Build
Anthos for hybrid/multi-cloud management
Error Reporting and Cloud Debugger
BigQuery for log analysis and metrics
Qualifications
Minimum: 5+ years in Site Reliability Engineering or Platform Engineering
Preferred: 7+ years with enterprise-scale cloud environments
Industry: Experience in high-availability, customer-facing systems preferred
Other SubCo Staffing Center – North America
#J-18808-Ljbffr