Intellias

Site Reliability Engineering Lead

Intellias, Poland, New York, United States

Direct message the job poster from Intellias

Site Reliability Engineering Expert We are looking for a

Site Reliability Engineering Expert

to drive reliability, scalability, and performance across our systems and services. This role is ideal for someone with deep technical expertise in SRE principles, cloud‑native infrastructure, and incident management. You will act as a strategic advisor and hands‑on contributor, helping teams build resilient systems and improve operational excellence.

Project Overview Our customer is a multinational corporation with more than a century of history and offices in over 180 countries. Their most ambitious goal at the time is to introduce a range of Reduced‑Risk Products (RRPs). The target audience is more than 1 billion consumers around the globe. The IT platform hosts 700+ applications.

Intellias mission is to help the client with the engineering of a comprehensive software ecosystem for a game‑changing IoT product on the margin of innovative consumer experience and cutting‑edge technology. Our teams are involved in the engineering of core platform components for best in class eCommerce, Digital Marketing and IoT solutions. As a Cloud engineer you will become a part of Core Architecture Team and be responsible for the architecture, implementation of best practices in our Digital Engineering Enterprise Platform.

The Platform is a set of services and internet applications that accelerate the development and delivery of software applications by taking care of common SDLC challenges. The Platform provides access and consumption for engineering teams to a set of services, technologies, practices for their development and for operating their application, ensure a set of compliance and best practices.

Project is in production for 2+ years, being supported by multiple teams.

Technical Domains

AWS cloud, partially Azure

SSO, Organizations, Service Control Policies, access models

IaC: Terraform Enterprise, Terratest, Chalice

Serverless: Lambda, Step Functions, wide range of misc automations, Fargate

System, Application, Network and security architectures

Orchestration: Kubernetes (EKS)

HashiCorp Vault

Hybrid Networking

Requirements

5+ years of experience in Site Reliability Engineering or related roles

Strong knowledge of SRE principles:

SLIs, SLOs, error budgets, incident response

Experience with

cloud platforms

(AWS, GCP, Azure) and

Kubernetes

Proficiency in

observability tools

(New Relic, Prometheus, Grafana, ELK, etc.)

Solid understanding of

CI/CD pipelines, automation, and infrastructure‑as‑code

(Terraform, Helm, ArgoCD)

Experience with incident management, post‑mortems, and reliability reviews

Excellent communication and mentoring skills

Nice to have

Certifications in SRE, cloud, or DevOps domains

Experience with chaos engineering and resilience testing

Familiarity with ITIL or other service management frameworks

Responsibilities

Define and promote SRE best practices across engineering teams

Lead reliability initiatives, including SLIs/SLOs definition and tracking

Design and implement scalable, fault‑tolerant systems

Collaborate on incident response, root cause analysis, and postmortem processes

Improve observability and alerting strategies for proactive issue detection

Mentor teams on reliability engineering principles and tooling

Drive automation to reduce toil and improve operational efficiency

Contribute to architectural decisions with a focus on reliability and performance

Seniority level Not Applicable

Employment type Full‑time

Industry IT Services and IT Consulting

Referrals increase your chances of interviewing at Intellias by 2x

Get notified about new Site Reliability Engineer jobs in

Poland .

#J-18808-Ljbffr