Tandem Inc.
Founding Site Reliability Engineer (Compliance, Security & Reliability)
Tandem Inc., Lehi, Utah, United States, 84043
Location
Lehi Office
Employment Type
Full time
Location Type
On-site
Department
Engineering
About the role:
We’re hiring our first SRE to build the operational foundation of our platform. You’ll own
compliance readiness, security posture, and production reliability
across our AWS + Kubernetes environment and our application stack (Next.js/Vercel, Sentry, Postgres). We deploy and manage services using
Porter (porter.run)
and infrastructure-as-code via
Terraform .This is a hands‑on role for someone who can set direction, implement guardrails, and build scalable systems and processes without slowing product delivery.
Core responsibilitiesCompliance (Core)
Lead audit readiness for frameworks such as
SOC 2
(and HIPAA-aligned controls as needed): define controls, implement them, and run evidence collection.
Establish repeatable processes for access reviews, change management, incident management, vendor risk management, and secure SDLC practices.
Automate compliance workflows where possible (continuous controls monitoring, evidence generation, audit trails, policy templates).
Security (Core)
Own cloud security architecture in
AWS
and
Kubernetes : least‑privilege IAM/RBAC, network segmentation, encryption standards, secrets management, and secure defaults.
Harden Kubernetes workloads: cluster baseline security, namespace boundaries, pod security standards, image provenance/scanning, and secure service‑to‑service communication.
Implement and tune security monitoring and incident response: centralized logging, actionable alerts, runbooks, on‑call workflows, and post‑incident reviews.
Drive vulnerability management across infra and app dependencies: patching, dependency scanning, container image scanning, and configuration drift detection.
Partner with engineering on threat modeling for major features and high‑risk changes.
Reliability (Core)
Define and own SLIs/SLOs, establish operational KPIs, and introduce error budgets where appropriate.
Improve observability across AWS + Kubernetes + apps using
Sentry
and monitoring best practices (metrics, logs, tracing, dashboards, alert routing).
Own production operations for
Postgres : backups/restores, replication strategy, migration safety, performance tuning, and capacity planning.
Build resilience: disaster recovery planning, recovery testing, high‑availability patterns, and graceful degradation.
Infrastructure, Kubernetes & Delivery Enablement
Own infrastructure‑as‑code using
Terraform : module standards, environment structure, state management, reviews, and guardrails.
Own the platform layer around
Kubernetes
and
Porter (porter.run) : cluster lifecycle practices, environment management, deployment workflows, and reliability of the delivery pipeline.
Improve CI/CD and deployment safety: progressive delivery, rollbacks, environment parity, and release observability.
Our stack
AWS ,
Terraform
Kubernetes ,
Porter (porter.run)
Next.js ,
Vercel
Postgres
Sentry
What success looks like (first 3–6 months)
Compliance roadmap is established and actively executed (audit evidence is increasingly automated).
AWS + Kubernetes have secure baselines: strong IAM/RBAC, secrets management, encryption defaults, and centralized logging.
SLOs exist for key services, incidents are handled consistently, and postmortems drive measurable reliability gains.
Postgres has tested backups/restores, solid monitoring, and a scaling/reliability plan.
Porter/Kubernetes delivery workflows are reliable, observable, and safe to operate.
Qualifications
6+ years in SRE / Platform / Security Engineering (or similar), owning production systems end‑to‑end.
Strong experience with
AWS
plus hands‑on
Kubernetes
operations in production.
Strong
Terraform
experience (modules, environments, drift control, guardrails).
Experience leading or significantly contributing to
SOC 2
(preferred) and/or HIPAA‑aligned operational controls.
Proven incident leadership: on‑call maturity, clear runbooks, effective postmortems.
Hands‑on experience operating
Postgres
in production.
Nice to have
Experience implementing Kubernetes security best practices (network policies, admission control, policy‑as‑code, supply chain security).
Familiarity with compliance/security frameworks (NIST/ISO‑style controls), vendor risk, and audit coordination.
Experience with Vercel/Next.js operational performance tuning.
Working model
This is an
in‑office role in Lehi, Utah , partnering closely with engineering leadership to embed security, compliance, and reliability into how we build.
#J-18808-Ljbffr
Lehi Office
Employment Type
Full time
Location Type
On-site
Department
Engineering
About the role:
We’re hiring our first SRE to build the operational foundation of our platform. You’ll own
compliance readiness, security posture, and production reliability
across our AWS + Kubernetes environment and our application stack (Next.js/Vercel, Sentry, Postgres). We deploy and manage services using
Porter (porter.run)
and infrastructure-as-code via
Terraform .This is a hands‑on role for someone who can set direction, implement guardrails, and build scalable systems and processes without slowing product delivery.
Core responsibilitiesCompliance (Core)
Lead audit readiness for frameworks such as
SOC 2
(and HIPAA-aligned controls as needed): define controls, implement them, and run evidence collection.
Establish repeatable processes for access reviews, change management, incident management, vendor risk management, and secure SDLC practices.
Automate compliance workflows where possible (continuous controls monitoring, evidence generation, audit trails, policy templates).
Security (Core)
Own cloud security architecture in
AWS
and
Kubernetes : least‑privilege IAM/RBAC, network segmentation, encryption standards, secrets management, and secure defaults.
Harden Kubernetes workloads: cluster baseline security, namespace boundaries, pod security standards, image provenance/scanning, and secure service‑to‑service communication.
Implement and tune security monitoring and incident response: centralized logging, actionable alerts, runbooks, on‑call workflows, and post‑incident reviews.
Drive vulnerability management across infra and app dependencies: patching, dependency scanning, container image scanning, and configuration drift detection.
Partner with engineering on threat modeling for major features and high‑risk changes.
Reliability (Core)
Define and own SLIs/SLOs, establish operational KPIs, and introduce error budgets where appropriate.
Improve observability across AWS + Kubernetes + apps using
Sentry
and monitoring best practices (metrics, logs, tracing, dashboards, alert routing).
Own production operations for
Postgres : backups/restores, replication strategy, migration safety, performance tuning, and capacity planning.
Build resilience: disaster recovery planning, recovery testing, high‑availability patterns, and graceful degradation.
Infrastructure, Kubernetes & Delivery Enablement
Own infrastructure‑as‑code using
Terraform : module standards, environment structure, state management, reviews, and guardrails.
Own the platform layer around
Kubernetes
and
Porter (porter.run) : cluster lifecycle practices, environment management, deployment workflows, and reliability of the delivery pipeline.
Improve CI/CD and deployment safety: progressive delivery, rollbacks, environment parity, and release observability.
Our stack
AWS ,
Terraform
Kubernetes ,
Porter (porter.run)
Next.js ,
Vercel
Postgres
Sentry
What success looks like (first 3–6 months)
Compliance roadmap is established and actively executed (audit evidence is increasingly automated).
AWS + Kubernetes have secure baselines: strong IAM/RBAC, secrets management, encryption defaults, and centralized logging.
SLOs exist for key services, incidents are handled consistently, and postmortems drive measurable reliability gains.
Postgres has tested backups/restores, solid monitoring, and a scaling/reliability plan.
Porter/Kubernetes delivery workflows are reliable, observable, and safe to operate.
Qualifications
6+ years in SRE / Platform / Security Engineering (or similar), owning production systems end‑to‑end.
Strong experience with
AWS
plus hands‑on
Kubernetes
operations in production.
Strong
Terraform
experience (modules, environments, drift control, guardrails).
Experience leading or significantly contributing to
SOC 2
(preferred) and/or HIPAA‑aligned operational controls.
Proven incident leadership: on‑call maturity, clear runbooks, effective postmortems.
Hands‑on experience operating
Postgres
in production.
Nice to have
Experience implementing Kubernetes security best practices (network policies, admission control, policy‑as‑code, supply chain security).
Familiarity with compliance/security frameworks (NIST/ISO‑style controls), vendor risk, and audit coordination.
Experience with Vercel/Next.js operational performance tuning.
Working model
This is an
in‑office role in Lehi, Utah , partnering closely with engineering leadership to embed security, compliance, and reliability into how we build.
#J-18808-Ljbffr