CardioOne
Hatch I.T.
is currently partnering with
CardioOne
to find a Site Reliability Engineer…
About the Company CardioOne partners with independent cardiologists to provide innovative solutions that improve patient outcomes and reduce costs. Our platform helps our physician partners thrive in today’s fee‑for‑service environment and prepare for success in value‑based care. In February 2024, we partnered with WindRose Health Investors as well as top physician services and payor executives to grow our team and invest in our next phase of growth.
CardioOne offers a magnificent work environment, good working conditions, and competitive pay. We offer medical, dental, vision, and a 401k plan with a match to benefit eligible employees. We offer PTO (Personal Time Off) and sick time to full‑time employees. We take pride in creating a culture of employee engagement that translates into an exemplary patient experience. Join us in our mission to positively impact US cardiology.
About the Job We are seeking a highly skilled
Site Reliability Engineer (SRE)
to ensure the reliability, scalability, security, and performance of our production systems and services. The SRE will bridge the gap between software development and operations, implementing automation, monitoring, and best practices to enable rapid, reliable delivery of applications. You will report directly to the Senior Director of Engineering.
What you’ll do: Reliability & Performance
Ensure high availability, scalability, and performance of production systems.
Implement and maintain
SLIs, SLOs, and SLAs
for critical services.
Conduct capacity planning and performance tuning.
Automation & Tooling
Automate infrastructure provisioning using IaC tools such as
Terraform and Terragrunt , ansible
Develop automation to minimize manual operations and improve deployment workflows.
Build CI/CD pipelines to support rapid and reliable deployments.
Monitoring & Incident Response
Design and maintain monitoring, logging, and alerting systems ( Datadog ).
Participate in on‑call rotations and lead incident response efforts.
Perform root‑cause analysis and develop postmortems to prevent recurring issues.
Systems Engineering
Manage cloud infrastructure (AWS, Azure) and container orchestration platforms (Kubernetes, ECS).
Optimize system architecture for reliability and fault tolerance.
Implement best practices for security, networking, and service resilience.
Collaboration & Leadership
Work closely with development teams to design reliable microservices and distributed systems.
Advocate for SRE principles and drive operational excellence across engineering teams.
Mentor engineers on reliability practices, tooling, and automation strategies.
What you’ll need:
Bachelor’s degree in Computer Science, Engineering, or equivalent experience.
3–7 years of experience in SRE, DevOps, or Systems Engineering roles.
Strong proficiency with Linux systems and shell scripting.
Experience with cloud platforms (AWS, Azure).
Hands‑on experience with Kubernetes/ECS and container technologies (Docker).
Proficiency in at least one programming language:
Python or Java
Experience with CI/CD pipelines and DevOps tooling.
Strong understanding of distributed systems, networking, and security fundamentals.
Preferred Qualifications
Experience with observability stacks (OpenTelemetry).
Knowledge of database management (PostgreSQL).
Experience with configuration management tools (Ansible, Chef, Puppet).
Familiarity with zero‑downtime deployments and chaos engineering practices.
Soft Skills
Strong analytical and problem‑solving skills.
Excellent communication and cross‑team collaboration.
Ability to thrive in fast‑paced, high‑stakes environments.
A mindset focused on continuous improvement and operational excellence.
Work Location:
Remote: Colorado, Delaware, Florida, New Hampshire, New Jersey, New York, Pennsylvania, Texas.
#J-18808-Ljbffr
is currently partnering with
CardioOne
to find a Site Reliability Engineer…
About the Company CardioOne partners with independent cardiologists to provide innovative solutions that improve patient outcomes and reduce costs. Our platform helps our physician partners thrive in today’s fee‑for‑service environment and prepare for success in value‑based care. In February 2024, we partnered with WindRose Health Investors as well as top physician services and payor executives to grow our team and invest in our next phase of growth.
CardioOne offers a magnificent work environment, good working conditions, and competitive pay. We offer medical, dental, vision, and a 401k plan with a match to benefit eligible employees. We offer PTO (Personal Time Off) and sick time to full‑time employees. We take pride in creating a culture of employee engagement that translates into an exemplary patient experience. Join us in our mission to positively impact US cardiology.
About the Job We are seeking a highly skilled
Site Reliability Engineer (SRE)
to ensure the reliability, scalability, security, and performance of our production systems and services. The SRE will bridge the gap between software development and operations, implementing automation, monitoring, and best practices to enable rapid, reliable delivery of applications. You will report directly to the Senior Director of Engineering.
What you’ll do: Reliability & Performance
Ensure high availability, scalability, and performance of production systems.
Implement and maintain
SLIs, SLOs, and SLAs
for critical services.
Conduct capacity planning and performance tuning.
Automation & Tooling
Automate infrastructure provisioning using IaC tools such as
Terraform and Terragrunt , ansible
Develop automation to minimize manual operations and improve deployment workflows.
Build CI/CD pipelines to support rapid and reliable deployments.
Monitoring & Incident Response
Design and maintain monitoring, logging, and alerting systems ( Datadog ).
Participate in on‑call rotations and lead incident response efforts.
Perform root‑cause analysis and develop postmortems to prevent recurring issues.
Systems Engineering
Manage cloud infrastructure (AWS, Azure) and container orchestration platforms (Kubernetes, ECS).
Optimize system architecture for reliability and fault tolerance.
Implement best practices for security, networking, and service resilience.
Collaboration & Leadership
Work closely with development teams to design reliable microservices and distributed systems.
Advocate for SRE principles and drive operational excellence across engineering teams.
Mentor engineers on reliability practices, tooling, and automation strategies.
What you’ll need:
Bachelor’s degree in Computer Science, Engineering, or equivalent experience.
3–7 years of experience in SRE, DevOps, or Systems Engineering roles.
Strong proficiency with Linux systems and shell scripting.
Experience with cloud platforms (AWS, Azure).
Hands‑on experience with Kubernetes/ECS and container technologies (Docker).
Proficiency in at least one programming language:
Python or Java
Experience with CI/CD pipelines and DevOps tooling.
Strong understanding of distributed systems, networking, and security fundamentals.
Preferred Qualifications
Experience with observability stacks (OpenTelemetry).
Knowledge of database management (PostgreSQL).
Experience with configuration management tools (Ansible, Chef, Puppet).
Familiarity with zero‑downtime deployments and chaos engineering practices.
Soft Skills
Strong analytical and problem‑solving skills.
Excellent communication and cross‑team collaboration.
Ability to thrive in fast‑paced, high‑stakes environments.
A mindset focused on continuous improvement and operational excellence.
Work Location:
Remote: Colorado, Delaware, Florida, New Hampshire, New Jersey, New York, Pennsylvania, Texas.
#J-18808-Ljbffr