Hippocratic AI

Senior Site Reliability Engineer (Cloud Infra)

Hippocratic AI, Palo Alto, California, United States, 94306

About Us

Hippocratic AI has developed a safety-focused Large Language Model (LLM) for healthcare. The company believes that a safe LLM can dramatically improve healthcare accessibility and health outcomes in the world by bringing deep healthcare expertise to every human. No other technology has the potential to have this level of global impact on health. Why Join Our Team

Innovative Mission: We are developing a safe, healthcare-focused large language model (LLM) designed to revolutionize health outcomes on a global scale.

Visionary Leadership: Hippocratic AI was co-founded by CEO Munjal Shah, alongside a group of physicians, hospital administrators, healthcare professionals, and artificial intelligence researchers from leading institutions, including El Camino Health, Johns Hopkins, Stanford, Microsoft, Google, and NVIDIA.

Strategic Investors: We have raised a total of $278 million in funding, backed by top investors such as Andreessen Horowitz, General Catalyst, Kleiner Perkins, NVIDIA’s NVentures, Premji Invest, SV Angel, and six health systems.

World-Class Team: Our team is composed of leading experts in healthcare and artificial intelligence, ensuring our technology is safe, effective, and capable of delivering meaningful improvements to healthcare delivery and outcomes.

For more information, visit www.HippocraticAI.com. We value in-person teamwork and believe the best ideas happen together. Our team is expected to be in the office five days a week in Palo Alto, CA unless explicitly noted otherwise in the job description About the Role

We are seeking a highly skilled

Senior Site Reliability Engineer

to join our team. In this role responsibilities will include designing and implementing infrastructure automation, continuous integration and delivery pipelines, and monitoring and scaling the infrastructure that powers our healthcare AI platform. You will work closely with software engineers, research scientists, and other cross-functional teams to develop and maintain reliable and scalable infrastructure that enables rapid iteration and deployment of our products. Key Responsibilities

Design and implement infrastructure automation and deployment pipelines using tools such as Terraform, Ansible, and Jenkins

Implement and maintain monitoring and logging systems to ensure the reliability and performance of our healthcare AI platform

Work closely with software engineers to design and deploy scalable, fault-tolerant, and secure production systems on cloud platforms such as AWS, GCP, or Azure

Develop and maintain security and compliance policies and procedures for our healthcare AI platform

Collaborate with cross-functional teams to troubleshoot and resolve complex issues related to infrastructure, deployment, and operations

Implement and maintain disaster recovery and business continuity plans

Develop and maintain documentation related to infrastructure, deployment, and operations

Mentor and provide technical guidance to junior engineers

Qualifications

Bachelor's or Master's degree in Computer Science, Computer Engineering, or a related field

At least 5 years of professional experience in DevOps engineering or a related field

Expertise in infrastructure automation and deployment tools such as Terraform, Ansible, Jenkins, or GitLab CI/CD

Experience with cloud platforms such as AWS, GCP, or Azure

Strong knowledge of containerization technologies such as Docker and Kubernetes

Experience with monitoring and logging tools such as ELK, Grafana, or Datadog

Familiarity with security and compliance best practices and tools such as HashiCorp Vault, AWS KMS, or Azure Key Vault

Strong problem-solving skills and ability to work independently and collaboratively in a team environment

Excellent communication and interpersonal skills

Experience implementing HIPAA and SOC2 compliance in a plus

Experience working in an HPC Environment is a plus

#J-18808-Ljbffr