Logo
Compunnel, Inc.

Site Reliability Engineer

Compunnel, Inc., Sun River, Montana, United States

Save Job

We are seeking a highly skilled and experienced Site Reliability Engineer (SRE) to design, build, and maintain scalable, secure, and fault-tolerant cloud infrastructure. This role is critical in ensuring operational excellence, high availability, and reliability across systems. The ideal candidate will have a strong background in cloud platforms, DevOps practices, observability, and modern software development frameworks. Key Responsibilities Design and manage cloud infrastructure using AWS, Azure, or GCP. Automate deployments and configurations using IaC tools like Terraform, CloudFormation, and Ansible. Implement anomaly detection, self-healing mechanisms, and cost optimization strategies. DevSecOps & CI/CD

Build and maintain CI/CD pipelines using GitLab, Jenkins, SonarQube, Nexus/Artifactory, and Docker. Apply DevSecOps principles and implement security best practices (IAM, RBAC, SAST/DAST/SCA). Observability & Incident Management

Implement monitoring and logging solutions using AWS CloudWatch, Splunk, Dynatrace, and OpenTelemetry. Lead root cause analysis, postmortems, and incident response to minimize MTTR and MTTD. Define and monitor SLOs, SLIs, and error budgets. Microservices & API Management

Architect and manage microservices and serverless APIs. Implement fault-tolerant patterns like Circuit Breaker, Retry, Timeout, and Bulkhead. Conduct chaos experiments using AWS FIS and Chaos Toolkit. Perform resiliency assessments and implement self-healing solutions. Database & Application Support

Manage databases such as PostgreSQL, MongoDB, DynamoDB, Oracle, and Redshift. Provide production support, incident response, and maintain runbooks. Collaborate with cross-functional teams to implement shift-left testing (BDD, TDD). Maintain architecture diagrams, knowledge articles, and disaster recovery plans. Communicate effectively with stakeholders and manage relationships across teams. Required Qualifications

8+ years of experience in SRE, DevOps, or related roles. Expertise in cloud platforms (AWS, Azure, or GCP) and container orchestration. Proficiency in Python, Java, Node.js, Bash, or PowerShell. Strong knowledge of database technologies (PostgreSQL, MongoDB, DynamoDB, Oracle, Redshift). Experience with CI/CD tools and build systems (Jenkins, Docker, Maven, Gradle). Familiarity with distributed systems, event-driven architecture, and AI/ML integrations. Expertise in observability tools and performance testing (JMeter, LoadRunner). Strong understanding of security practices and disaster recovery planning. Excellent communication and documentation skills. Bachelor’s or Master’s degree in Computer Science or related field. Preferred Qualifications

Experience with AI/ML libraries (e.g., NLTK, Transformers, Spacy, SciPy), Amazon SageMaker, and GenAI tools. Familiarity with project management tools (JIRA, Confluence, ServiceNow). Proficiency with utilities like AWS CLI, Postman, and curl. Certifications

AWS Solutions Architect, Agile Certified Practitioner (ACP), or other relevant cloud certifications (preferred).

#J-18808-Ljbffr