Compunnel
We are seeking an experienced Site Reliability Engineer (SRE) to support and enhance the resiliency, performance, and availability of our Digital Sales & Marketing platforms.
This role involves production support, automation, dashboard creation, collaboration across engineering functions, and proactive performance monitoring.
The ideal candidate will have a strong background in software engineering, DevOps practices, and SRE principles.
Key Responsibilities
Develop, test, and automate processes to improve platform health and performance
Monitor and manage application performance using APM tools like Splunk, GCL, ELK, Grafana, AppDynamics, and Dynatrace
Create dashboards and set up alerts for proactive incident response
Collaborate with cross-functional teams including Security, Networking, and Infrastructure to resolve platform health issues
Support both legacy and cloud-based infrastructures (e.g., PCF, Azure)
Participate in production outage resolution, RCA creation, and implementing permanent fixes
Ensure SLAs and SLOs are met and drive continuous improvement of platform metrics
Plan, support, and comply with governance and control processes
Identify and escalate operational risks, process deficiencies, and data integrity issues
Promote DevOps and SRE practices throughout the organization
Participate in 12/7 support rotations and shift duties
Required Qualifications
8+ years of Software Engineering experience or equivalent (including military, training, or education) 5+ years of experience in production support/SRE teams Proficiency with Agile or other rapid development methodologies Hands-on experience with: Automated testing and process automation Java/J2EE, Spring, Spring Boot, Python, Shell scripting Relational and NoSQL databases (Oracle, MongoDB) Kafka, Redis, Messaging tools (MQ) Strong understanding of APM tools and dashboard creation (e.g., Splunk, GCL, ELK, Grafana) Experience with API styles (SOAP, REST, Microservices) and tools like Postman Proactive mindset for identifying performance bottlenecks and areas for improvement Excellent communication skills with ability to influence SRE best practices across teams Preferred Qualifications
Experience with on-prem and 3rd party cloud platforms (PCF, Azure) Experience supporting both modern and legacy systems Prior involvement in risk and governance program implementation Strong documentation and reporting skills
#J-18808-Ljbffr
8+ years of Software Engineering experience or equivalent (including military, training, or education) 5+ years of experience in production support/SRE teams Proficiency with Agile or other rapid development methodologies Hands-on experience with: Automated testing and process automation Java/J2EE, Spring, Spring Boot, Python, Shell scripting Relational and NoSQL databases (Oracle, MongoDB) Kafka, Redis, Messaging tools (MQ) Strong understanding of APM tools and dashboard creation (e.g., Splunk, GCL, ELK, Grafana) Experience with API styles (SOAP, REST, Microservices) and tools like Postman Proactive mindset for identifying performance bottlenecks and areas for improvement Excellent communication skills with ability to influence SRE best practices across teams Preferred Qualifications
Experience with on-prem and 3rd party cloud platforms (PCF, Azure) Experience supporting both modern and legacy systems Prior involvement in risk and governance program implementation Strong documentation and reporting skills
#J-18808-Ljbffr