Compunnel, Inc.
We are looking for a highly skilled Site Reliability Engineer (SRE) to join our team. In this critical role, you will leverage your strong technical background to support engineering and operational needs. You will be responsible for ensuring the reliability and scalability of our systems, addressing incidents, and improving infrastructure and observability. Your work will involve collaborating closely with various teams and continuously iterating on system improvements.
Key Responsibilities:
Monitoring & Observability: Implement and improve monitoring, alerts, and observability across systems using tools like Splunk, AppDynamics, Thousand Eyes, and Extra Hop.
Infrastructure Orchestration: Work with enterprise-level infrastructure orchestration using SALT, Kubernetes, and IAAS to ensure robust and scalable systems.
Networking & System Administration: Apply your knowledge of networking protocols (e.g., DNS, DHCP, firewalls, load balancers, IP routing) to troubleshoot and optimize system performance.
Cloud Transition & Support: Assist with transitioning platforms to the cloud and provide support for GCP, Azure, and PCF technologies.
Database Support: Support and troubleshoot databases including Oracle, SQL Server, and MongoDB.
Collaborative Problem Solving: Collaborate with developers and architects to find the best solutions and iteratively improve system design.
Incident Management: Respond to alerts, escalations, and system recovery events while minimizing noisy alerts and improving system reliability.
System Debugging: Utilize excellent debugging skills across integrated platforms to resolve issues quickly and effectively.
Continuous Improvement: Continuously improve system reliability, availability, and performance by iterating on feedback loops and system design enhancements.
Required Qualifications:
Experience with Monitoring Tools: Working knowledge of Splunk, AppDynamics, Thousand Eyes, and Extra Hop. Networking Knowledge: Solid understanding of DNS, DHCP, firewalls, load balancers, and IP routing. Database Familiarity: Experience with Oracle, SQL Server, and MongoDB databases. Scripting & Programming: Experience with C#, .NET, Java, and scripting languages. Infrastructure Orchestration: Extensive experience with SALT, Kubernetes, and IAAS for enterprise-level orchestration. System Administration: Strong experience in Linux and Windows administration, troubleshooting, and support. Cloud Technologies: Knowledge and experience transitioning platforms to the cloud, specifically with GCP, Azure, and PCF. Atlassian Tools: Experience using Jira, Confluence, Bamboo, Bitbucket, and Harness for project management and collaboration. Debugging Skills: Excellent debugging skills across a variety of integrated platforms. Preferred Qualifications:
Strong cloud-native development experience and deep understanding of distributed systems. Experience working in high-availability environments and with distributed systems. Experience with CI/CD pipeline development and tools. Certifications (if any):
Cloud Certifications (e.g., AWS Certified Solutions Architect, Google Professional Cloud Architect, Azure Solutions Architect Expert) are preferred. Kubernetes Certification (e.g., CKA, CKAD) is a plus. Linux/Systems Administration Certifications (e.g., RHCE, CompTIA Linux+) are advantageous.
#J-18808-Ljbffr
Experience with Monitoring Tools: Working knowledge of Splunk, AppDynamics, Thousand Eyes, and Extra Hop. Networking Knowledge: Solid understanding of DNS, DHCP, firewalls, load balancers, and IP routing. Database Familiarity: Experience with Oracle, SQL Server, and MongoDB databases. Scripting & Programming: Experience with C#, .NET, Java, and scripting languages. Infrastructure Orchestration: Extensive experience with SALT, Kubernetes, and IAAS for enterprise-level orchestration. System Administration: Strong experience in Linux and Windows administration, troubleshooting, and support. Cloud Technologies: Knowledge and experience transitioning platforms to the cloud, specifically with GCP, Azure, and PCF. Atlassian Tools: Experience using Jira, Confluence, Bamboo, Bitbucket, and Harness for project management and collaboration. Debugging Skills: Excellent debugging skills across a variety of integrated platforms. Preferred Qualifications:
Strong cloud-native development experience and deep understanding of distributed systems. Experience working in high-availability environments and with distributed systems. Experience with CI/CD pipeline development and tools. Certifications (if any):
Cloud Certifications (e.g., AWS Certified Solutions Architect, Google Professional Cloud Architect, Azure Solutions Architect Expert) are preferred. Kubernetes Certification (e.g., CKA, CKAD) is a plus. Linux/Systems Administration Certifications (e.g., RHCE, CompTIA Linux+) are advantageous.
#J-18808-Ljbffr