R Systems International Limited
Senior Site Reliability Engineer
R Systems International Limited, Poland, New York, United States
R Systems is seeking an experienced Senior Site Reliability Engineer to design, build, and operate resilient, scalable, and secure systems across multi-cloud environments. The role emphasizes AWS expertise (80%) with a strong Azure foundation (20%). You will lead initiatives in automation, observability, incident management, and release reliability to ensure mission-critical applications run smoothly at enterprise scale.
Responsibilities
Cloud Infrastructure (AWS & Azure)
Proven track record of handling high-severity incidents and driving RCA.
Architect, implement, and manage highly available, fault-tolerant infrastructure.
AWS (primary): EKS, ECS, Lambda, API Gateway, S3, RDS, DynamoDB, IAM, CloudWatch, CloudTrail, CloudFormation/Terraform.
Azure (secondary): AKS, App Services, Azure Functions, Azure Monitor, Azure DevOps Pipelines.
Implement best practices for multi-cloud security, networking, and DR/BCP.
SRE & Reliability Engineering
Define and maintain SLIs, SLOs, and SLAs across distributed systems.
Conduct capacity planning, fault-tolerance reviews, chaos engineering, and DR drills.
Lead incident response, on-call rotations, and blameless postmortems.
Continuously optimize performance, cost, and reliability.
Automation & Infrastructure as Code (IaC)
Automate infrastructure provisioning with Terraform, Helm, Ansible, and GitOps workflows.
Design and maintain CI/CD pipelines (GitHub Actions, Jenkins, GitLab CI, Azure DevOps).
Enforce policy-as-code and integrate security & compliance automation.
Observability, Monitoring & Telemetry
Build comprehensive monitoring and observability solutions: CloudWatch, Prometheus, ELK/EFK, Datadog, Grafana, Splunk, New Relic.
Implement centralized logging, distributed tracing, OpenTelemetry standards.
Enable proactive alerting, anomaly detection, and automated remediation.
Release & Incident Management
Collaborate with DevOps and engineering teams to ensure reliable, safe, and repeatable releases.
Implement blue/green, rolling, and canary deployment strategies.
Drive root cause analysis (RCA), knowledge sharing, and preventive engineering.
Establish incident playbooks and integrate with ITSM tools (ServiceNow, PagerDuty, Opsgenie).
Qualifications
7+ years in SRE / DevOps / Cloud engineering roles.
Deep AWS expertise (60%) with working knowledge of Azure (40%).
Strong proficiency with Kubernetes (EKS/AKS), containers, and microservices.
Hands-on with Terraform, Helm, CI/CD platforms, observability stacks.
Solid foundation in networking, IAM, cloud security, and compliance (SOC2, HIPAA, NIST).
Proven track record of handling high-severity incidents and driving RCA.
Preferred Certifications
AWS Solutions Architect – Professional
Azure Solutions Architect Expert
Certified Kubernetes Administrator (CKA)
What’s In It For You
Hybrid work policy with equipment provided to support work-life balance.
Health coverage with private medical subscription.
Professional development with access to Udemy and paid study time for eligible learners.
Referral bonuses and long-term contribution rewards.
Note: This description reflects the responsibilities and qualifications for the Senior Site Reliability Engineer role at R Systems. It does not include non-essential site notices or unrelated content from the original posting.
#J-18808-Ljbffr
Responsibilities
Cloud Infrastructure (AWS & Azure)
Proven track record of handling high-severity incidents and driving RCA.
Architect, implement, and manage highly available, fault-tolerant infrastructure.
AWS (primary): EKS, ECS, Lambda, API Gateway, S3, RDS, DynamoDB, IAM, CloudWatch, CloudTrail, CloudFormation/Terraform.
Azure (secondary): AKS, App Services, Azure Functions, Azure Monitor, Azure DevOps Pipelines.
Implement best practices for multi-cloud security, networking, and DR/BCP.
SRE & Reliability Engineering
Define and maintain SLIs, SLOs, and SLAs across distributed systems.
Conduct capacity planning, fault-tolerance reviews, chaos engineering, and DR drills.
Lead incident response, on-call rotations, and blameless postmortems.
Continuously optimize performance, cost, and reliability.
Automation & Infrastructure as Code (IaC)
Automate infrastructure provisioning with Terraform, Helm, Ansible, and GitOps workflows.
Design and maintain CI/CD pipelines (GitHub Actions, Jenkins, GitLab CI, Azure DevOps).
Enforce policy-as-code and integrate security & compliance automation.
Observability, Monitoring & Telemetry
Build comprehensive monitoring and observability solutions: CloudWatch, Prometheus, ELK/EFK, Datadog, Grafana, Splunk, New Relic.
Implement centralized logging, distributed tracing, OpenTelemetry standards.
Enable proactive alerting, anomaly detection, and automated remediation.
Release & Incident Management
Collaborate with DevOps and engineering teams to ensure reliable, safe, and repeatable releases.
Implement blue/green, rolling, and canary deployment strategies.
Drive root cause analysis (RCA), knowledge sharing, and preventive engineering.
Establish incident playbooks and integrate with ITSM tools (ServiceNow, PagerDuty, Opsgenie).
Qualifications
7+ years in SRE / DevOps / Cloud engineering roles.
Deep AWS expertise (60%) with working knowledge of Azure (40%).
Strong proficiency with Kubernetes (EKS/AKS), containers, and microservices.
Hands-on with Terraform, Helm, CI/CD platforms, observability stacks.
Solid foundation in networking, IAM, cloud security, and compliance (SOC2, HIPAA, NIST).
Proven track record of handling high-severity incidents and driving RCA.
Preferred Certifications
AWS Solutions Architect – Professional
Azure Solutions Architect Expert
Certified Kubernetes Administrator (CKA)
What’s In It For You
Hybrid work policy with equipment provided to support work-life balance.
Health coverage with private medical subscription.
Professional development with access to Udemy and paid study time for eligible learners.
Referral bonuses and long-term contribution rewards.
Note: This description reflects the responsibilities and qualifications for the Senior Site Reliability Engineer role at R Systems. It does not include non-essential site notices or unrelated content from the original posting.
#J-18808-Ljbffr