Kontakt.io
Join to apply for the
Lead Software Engineer - SRE
role at
Kontakt.io
4 days ago Be among the first 25 applicants
Kontakt.io is building the platform that care operations run on. We reduce waste, cut costs, and improve revenue by improving throughput, asset utilization and staff productivity. Our platform uses AI, RTLS, and EHR data to enable self-learning agents to automate workflows, adapt in real-time, and orchestrate all of care delivery operations.
Easy to deploy and scale, it gives a clear picture of spaces, equipment, and people, eliminating inefficiencies and enhancing the patient experience. With measurable 10X ROI and over 20+ use cases, Kontakt.io is the go-to platform for better and faster care delivery operations.
We are looking for a
Lead Software Engineer - SRE
with a strong software engineering foundation and a strategic mindset to drive the reliability, scalability, and performance of our platform. This role is part of our Infrastructure Engineering team and will play a central part in shaping the architecture and direction of our SRE function.
The ideal candidate brings a deep understanding of software engineering principles applied to infrastructure. Rather than maintaining systems, you will lead the design and build them, developing automation, tooling, and resilient architecture that enable high availability and fault tolerance across our entire AWS-based platform.
You’ll work hands‑on in designing resilient systems, improving deployment pipelines, and driving incident management practices. As a technical leader, you’ll also mentor engineers, shape technical strategy, and help build a culture of accountability, ownership, and continuous improvement across the organization.
Responsibilities
Lead the design and implementation of scalable, fault‑tolerant, and self‑healing infrastructure and services across AWS and Kubernetes
Collaborate with Product, Engineering, and Infrastructure teams to align SRE initiatives with business priorities and platform needs
Define and drive adoption of SLIs, SLOs, and SLAs to ensure consistent performance and high reliability across the platform
Own and evolve observability strategies using Prometheus, OpenTelemetry, Grafana, and related tooling
Design and maintain infrastructure as code (Terraform) and drive GitOps best practices
Oversee major incident response and on‑call practices, including incident reviews and long‑term remediation planning
Mentor and support the growth of SRE and platform engineers, fostering a culture of engineering rigor and operational excellence
Contribute to the long‑term reliability roadmap and architecture of high‑throughput, real‑time systems in healthcare operations
Drive process improvements in CI/CD, service ownership, chaos engineering, disaster recovery, and secure deployment
What You Bring
5+ years of experience in Site Reliability Engineering, Cloud Infrastructure, or Platform Engineering
5+ years of software engineering experience building production‑grade systems (Java, Python, Go, or similar)
Proven success scaling high‑traffic, mission‑critical platforms in SaaS, IoT, or healthcare environments
Deep expertise in cloud platforms (especially AWS), Kubernetes, and distributed system architecture
Hands‑on experience with monitoring, logging, and observability tools (Prometheus, OpenTelemetry, Datadog, etc.)
Extensive knowledge of CI/CD automation, GitOps workflows, and infrastructure‑as‑code (Terraform, Helm, ArgoCD)
A track record of leading major incident response and running postmortems with a blameless, learning‑focused approach
Strong understanding of networking, access control, and security within regulated environments (HIPAA, SOC 2)
A leadership mindset—able to drive cross‑functional alignment, lead initiatives, and mentor a high‑performance SRE team
Why You’ll Love It Here
Own Mission‑Critical Reliability – Ensure hospitals and care facilities always stay online with a 99.99% uptime healthcare platform
Scale AI‑Powered Infrastructure – Work on real‑time automation and self‑healing cloud systems that orchestrate care delivery
Drive Big Impact in Healthcare – Help reduce waste, optimize resources, and improve patient care with technology that delivers 10X ROI
Automation‑First Culture – Minimize manual ops with cutting‑edge automation, observability, and incident response strategies
Join a High‑Performing Team – Work with top engineers, AI experts, and healthcare innovators solving real‑world challenges
Ready to Build the Future of Healthcare? Apply now and
help scale the platform that care operations run on.
We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more informationabout how your data is processed, please contact us.
#J-18808-Ljbffr
Lead Software Engineer - SRE
role at
Kontakt.io
4 days ago Be among the first 25 applicants
Kontakt.io is building the platform that care operations run on. We reduce waste, cut costs, and improve revenue by improving throughput, asset utilization and staff productivity. Our platform uses AI, RTLS, and EHR data to enable self-learning agents to automate workflows, adapt in real-time, and orchestrate all of care delivery operations.
Easy to deploy and scale, it gives a clear picture of spaces, equipment, and people, eliminating inefficiencies and enhancing the patient experience. With measurable 10X ROI and over 20+ use cases, Kontakt.io is the go-to platform for better and faster care delivery operations.
We are looking for a
Lead Software Engineer - SRE
with a strong software engineering foundation and a strategic mindset to drive the reliability, scalability, and performance of our platform. This role is part of our Infrastructure Engineering team and will play a central part in shaping the architecture and direction of our SRE function.
The ideal candidate brings a deep understanding of software engineering principles applied to infrastructure. Rather than maintaining systems, you will lead the design and build them, developing automation, tooling, and resilient architecture that enable high availability and fault tolerance across our entire AWS-based platform.
You’ll work hands‑on in designing resilient systems, improving deployment pipelines, and driving incident management practices. As a technical leader, you’ll also mentor engineers, shape technical strategy, and help build a culture of accountability, ownership, and continuous improvement across the organization.
Responsibilities
Lead the design and implementation of scalable, fault‑tolerant, and self‑healing infrastructure and services across AWS and Kubernetes
Collaborate with Product, Engineering, and Infrastructure teams to align SRE initiatives with business priorities and platform needs
Define and drive adoption of SLIs, SLOs, and SLAs to ensure consistent performance and high reliability across the platform
Own and evolve observability strategies using Prometheus, OpenTelemetry, Grafana, and related tooling
Design and maintain infrastructure as code (Terraform) and drive GitOps best practices
Oversee major incident response and on‑call practices, including incident reviews and long‑term remediation planning
Mentor and support the growth of SRE and platform engineers, fostering a culture of engineering rigor and operational excellence
Contribute to the long‑term reliability roadmap and architecture of high‑throughput, real‑time systems in healthcare operations
Drive process improvements in CI/CD, service ownership, chaos engineering, disaster recovery, and secure deployment
What You Bring
5+ years of experience in Site Reliability Engineering, Cloud Infrastructure, or Platform Engineering
5+ years of software engineering experience building production‑grade systems (Java, Python, Go, or similar)
Proven success scaling high‑traffic, mission‑critical platforms in SaaS, IoT, or healthcare environments
Deep expertise in cloud platforms (especially AWS), Kubernetes, and distributed system architecture
Hands‑on experience with monitoring, logging, and observability tools (Prometheus, OpenTelemetry, Datadog, etc.)
Extensive knowledge of CI/CD automation, GitOps workflows, and infrastructure‑as‑code (Terraform, Helm, ArgoCD)
A track record of leading major incident response and running postmortems with a blameless, learning‑focused approach
Strong understanding of networking, access control, and security within regulated environments (HIPAA, SOC 2)
A leadership mindset—able to drive cross‑functional alignment, lead initiatives, and mentor a high‑performance SRE team
Why You’ll Love It Here
Own Mission‑Critical Reliability – Ensure hospitals and care facilities always stay online with a 99.99% uptime healthcare platform
Scale AI‑Powered Infrastructure – Work on real‑time automation and self‑healing cloud systems that orchestrate care delivery
Drive Big Impact in Healthcare – Help reduce waste, optimize resources, and improve patient care with technology that delivers 10X ROI
Automation‑First Culture – Minimize manual ops with cutting‑edge automation, observability, and incident response strategies
Join a High‑Performing Team – Work with top engineers, AI experts, and healthcare innovators solving real‑world challenges
Ready to Build the Future of Healthcare? Apply now and
help scale the platform that care operations run on.
We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more informationabout how your data is processed, please contact us.
#J-18808-Ljbffr