Logo
Compunnel

Site Reliability Engineer

Compunnel, Charlotte, North Carolina, United States, 28245

Save Job

We are seeking an experienced Site Reliability Engineer (SRE) to support and enhance the resiliency, performance, and availability of our Digital Sales & Marketing platforms. This role involves production support, automation, dashboard creation, collaboration across engineering functions, and proactive performance monitoring. The ideal candidate will have a strong background in software engineering, DevOps practices, and SRE principles. Key Responsibilities Develop, test, and automate processes to improve platform health and performance Monitor and manage application performance using APM tools like Splunk, GCL, ELK, Grafana, AppDynamics, and Dynatrace Create dashboards and set up alerts for proactive incident response Collaborate with cross-functional teams including Security, Networking, and Infrastructure to resolve platform health issues Support both legacy and cloud-based infrastructures (e.g., PCF, Azure) Participate in production outage resolution, RCA creation, and implementing permanent fixes Ensure SLAs and SLOs are met and drive continuous improvement of platform metrics Plan, support, and comply with governance and control processes Identify and escalate operational risks, process deficiencies, and data integrity issues Promote DevOps and SRE practices throughout the organization Participate in 12/7 support rotations and shift duties Required Qualifications

8+ years of Software Engineering experience or equivalent (including military, training, or education) 5+ years of experience in production support/SRE teams Proficiency with Agile or other rapid development methodologies Hands-on experience with: Automated testing and process automation Java/J2EE, Spring, Spring Boot, Python, Shell scripting Relational and NoSQL databases (Oracle, MongoDB) Kafka, Redis, Messaging tools (MQ) Strong understanding of APM tools and dashboard creation (e.g., Splunk, GCL, ELK, Grafana) Experience with API styles (SOAP, REST, Microservices) and tools like Postman Proactive mindset for identifying performance bottlenecks and areas for improvement Excellent communication skills with ability to influence SRE best practices across teams Preferred Qualifications

Experience with on-prem and 3rd party cloud platforms (PCF, Azure) Experience supporting both modern and legacy systems Prior involvement in risk and governance program implementation Strong documentation and reporting skills

#J-18808-Ljbffr