Logo
Indotronix Avani Group

Site Reliability Engineer

Indotronix Avani Group, San Diego, California, United States, 92189

Save Job

Get AI-powered advice on this job and more exclusive features. Direct message the job poster from Indotronix Avani Group The Site Reliability Engineer (SRE) will work closely with cross-functional teams, including software development, platform, and operations, to support the availability and performance of our cloud-based systems. You will take ownership of the cloud infrastructure, support automation and implement monitoring and alerting systems to proactively manage issues. Responsibilities

Design, deploy, and maintain scalable, secure, and highly available cloud infrastructure on AWS and Azure. Proficient in infrastructure-as-code (Terraform, AWS CDK and CloudFormation) and scripting languages (TypeScript, PowerShell or Go-Lang). Ensure cloud environments adhere to regulatory standards for healthcare data security and familiarity with (e.g., SOC II and ePHI compliance). Observability and Monitoring

Implement, configure, and optimize Datadog for application and infrastructure monitoring, ensuring full-stack visibility into system performance. Set up alerting mechanisms for critical metrics (e.g., system health, latency, error rates) and establish runbooks for incident response. Develop and maintain dashboards to provide real-time insights into system performance. Performance Optimization & Troubleshooting

Identify and resolve performance bottlenecks and ensure the reliability and scalability of production systems. Perform root cause analysis for incidents and participate in on-call rotations to manage critical system incidents. Drive improvements to system architecture, security, and disaster recovery strategies. Work closely with development teams to incorporate CI/CD pipelines and foster a culture of “infrastructure as code” and automation. Collaborate with security and compliance teams to ensure systems meet all regulatory and security requirements. Promote best practices for software delivery, system monitoring, and infrastructure scalability. Security & Compliance

Work with the compliance and cybersecurity teams to maintain healthcare data security, ensuring that systems are SOC II and ePHI compliant. Implement security best practices within cloud environments, including encryption, IAM, and regular audits. Qualifications

Bachelor’s degree in Computer Science, Engineering, or related field, or equivalent practical experience. 3+ years of experience as a Site Reliability Engineer, managing infrastructure on AWS and/or Azure. Experience with monitoring and observability tools (Prometheus, Grafana, Datadog, etc.). Expertise in Terraform, CloudFormation, AWS CDK or similar infrastructure-as-code technologies. Proficiency in container orchestration and management (e.g., Docker, Kubernetes). Knowledge of automation tools (e.g., Ansible, Puppet, Chef). Familiarity with CI/CD pipeline tools such as Jenkins, GitHub Actions, or Azure DevOps. Experience with healthcare data security and compliance (e.g., SOC II and ePHI requirements) is a plus. Excellent problem-solving and troubleshooting skills. Strong collaboration and communication skills. Get notified about new Site Reliability Engineer jobs in

San Diego, CA . San Diego Metropolitan Area $110,000 - $130,000 5 days ago We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

#J-18808-Ljbffr