Storm2
Senior Site Reliability Engineer
Location: New York, NY
Type: Full-time | Hybrid
Salary: $160,000 - $180,000 + 10% bonus
Base pay range: $160,000.00/yr - $180,000.00/yr
About Our Client
Our client is a rapidly growing technology company at the forefront of digital infrastructure innovation. They partner with leading organizations to deliver secure, scalable, and high-performance platforms that support millions of users nationwide. Their mission is to build and safeguard the technology backbone that powers the modern economy.
The Opportunity As a Senior Site Reliability Engineer (SRE), you’ll play a key role in building and maintaining resilient systems that scale to meet future demand. You’ll work with cross-functional teams to design, automate, and optimize complex distributed systems, ensuring customers experience fast, reliable, and secure services at all times.
What You’ll Do
Drive reliability, automation, and scalability across mission-critical applications.
Build tooling and frameworks for deployment automation, configuration management, and disaster recovery.
Implement observability solutions (monitoring, alerting, tracing, and logging) to proactively identify and address issues.
Partner with engineering teams to influence software design, microservice best practices, and CI/CD workflows.
Diagnose and resolve performance bottlenecks in large-scale distributed systems.
Provide runbooks, documentation, and technical guidance to operational support teams.
Participate in a 24/7 on-call rotation and drive improvements that reduce incident frequency and impact.
Establish repeatable patterns and standards that improve service delivery and operational efficiency.
What We’re Looking For
Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent experience).
3+ years in a site reliability, DevOps, or systems engineering role within a mid-to-large-scale enterprise.
Solid knowledge of scripting (Python, Bash, Ruby, or similar).
Hands‑on experience with observability tools (e.g., Prometheus, Grafana, ELK, Datadog, New Relic).
Familiarity with CI/CD pipelines, Git, and modern software delivery practices.
Knowledge of networking protocols (TCP/UDP/IP) and security/encryption standards.
Excellent communication and collaboration skills, with the ability to work across engineering and product teams.
Preferred Qualifications
Experience supporting production applications in a 24/7 customer‑facing environment.
Strong background with containerization and orchestration (Docker, Kubernetes, Swarm).
Hands‑on experience with AWS or other cloud platforms.
Why Join
Competitive healthcare, dental, and vision coverage
401(k) with company match
Generous PTO and paid holidays, plus volunteer days
12 weeks paid parental leave
#J-18808-Ljbffr
Type: Full-time | Hybrid
Salary: $160,000 - $180,000 + 10% bonus
Base pay range: $160,000.00/yr - $180,000.00/yr
About Our Client
Our client is a rapidly growing technology company at the forefront of digital infrastructure innovation. They partner with leading organizations to deliver secure, scalable, and high-performance platforms that support millions of users nationwide. Their mission is to build and safeguard the technology backbone that powers the modern economy.
The Opportunity As a Senior Site Reliability Engineer (SRE), you’ll play a key role in building and maintaining resilient systems that scale to meet future demand. You’ll work with cross-functional teams to design, automate, and optimize complex distributed systems, ensuring customers experience fast, reliable, and secure services at all times.
What You’ll Do
Drive reliability, automation, and scalability across mission-critical applications.
Build tooling and frameworks for deployment automation, configuration management, and disaster recovery.
Implement observability solutions (monitoring, alerting, tracing, and logging) to proactively identify and address issues.
Partner with engineering teams to influence software design, microservice best practices, and CI/CD workflows.
Diagnose and resolve performance bottlenecks in large-scale distributed systems.
Provide runbooks, documentation, and technical guidance to operational support teams.
Participate in a 24/7 on-call rotation and drive improvements that reduce incident frequency and impact.
Establish repeatable patterns and standards that improve service delivery and operational efficiency.
What We’re Looking For
Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent experience).
3+ years in a site reliability, DevOps, or systems engineering role within a mid-to-large-scale enterprise.
Solid knowledge of scripting (Python, Bash, Ruby, or similar).
Hands‑on experience with observability tools (e.g., Prometheus, Grafana, ELK, Datadog, New Relic).
Familiarity with CI/CD pipelines, Git, and modern software delivery practices.
Knowledge of networking protocols (TCP/UDP/IP) and security/encryption standards.
Excellent communication and collaboration skills, with the ability to work across engineering and product teams.
Preferred Qualifications
Experience supporting production applications in a 24/7 customer‑facing environment.
Strong background with containerization and orchestration (Docker, Kubernetes, Swarm).
Hands‑on experience with AWS or other cloud platforms.
Why Join
Competitive healthcare, dental, and vision coverage
401(k) with company match
Generous PTO and paid holidays, plus volunteer days
12 weeks paid parental leave
#J-18808-Ljbffr