Coforge
Job Details
Job Title:
Site Reliability Engineer Location:
Fort Mill, SC Onsite:
Yes Employment Type:
Full-time Experience:
8 Years Responsibilities
Proficiency in Core SRE Principles: CUJ, SLO, SLI, and Error Budgeting based on NFRs; apply these principles to ensure service reliability, meet business objectives, and drive continuous improvement. Identify manual and repetitive tasks within the SDLC or IT operations and implement automation to reduce TOIL. Streamline processes to enhance productivity and free resources for strategic initiatives through automation and process improvement; comprehensive CI/CD. Proficiency in Continuous CI/CD practices, with robust knowledge of Git, GitHub Actions, and GitHub Workflows. Familiarity with tools such as Jenkins (advantageous). Engage in and improve the entire lifecycle of applications and cloud services from inception/design through deployment, operation, and refinement; design, develop, ship, and enable software and systems to increase reliability and efficiency. Lead development and tracking of SRE Error Budgets; lead development of SRE dashboards. Lead root cause investigations; proactively identify system anomalies and automation opportunities. Integrate into the software release cycle; work with developers to ensure releases are well designed, planned, implemented, released, and monitored. Automate time-consuming/manual processes. Assess current SRE solutions and define the SRE approach for products; collaborate with application development teams to design, implement, and improve SRE practices. Cloud platform expertise with strong knowledge of Infrastructure as Code (IAC), container orchestration, monitoring, and observability; solid understanding of AWS and related tooling. Proven ability to design and implement monitoring solutions that ensure system uptime and performance. Experience with AIOps principles and automation best practices. Excellent communication, collaboration, and problem-solving skills. Leverage industry-leading tools like Dynatrace, Splunk, and Elastic Stack for real-time monitoring and troubleshooting. Qualifications
Experience: 8 years in related roles. Skills: .Net, SQL, React, Dynatrace, Solarwinds DPA, AWS Cloud, Splunk, Elastic Stack, Python, scripting languages, Ansible Tower, Terraform. Strong knowledge of CI/CD, Git, GitHub Actions, and GitHub Workflows; familiarity with Jenkins is advantageous. Proven ability to design and implement monitoring and observability solutions; experience with AIOps. Seniority and Function
Seniority level:
Mid-Senior level Job function:
Information Technology Industry:
IT Services and IT Consulting Note: This summary reflects the information provided in the job description and keeps the original context intact.
#J-18808-Ljbffr
Job Title:
Site Reliability Engineer Location:
Fort Mill, SC Onsite:
Yes Employment Type:
Full-time Experience:
8 Years Responsibilities
Proficiency in Core SRE Principles: CUJ, SLO, SLI, and Error Budgeting based on NFRs; apply these principles to ensure service reliability, meet business objectives, and drive continuous improvement. Identify manual and repetitive tasks within the SDLC or IT operations and implement automation to reduce TOIL. Streamline processes to enhance productivity and free resources for strategic initiatives through automation and process improvement; comprehensive CI/CD. Proficiency in Continuous CI/CD practices, with robust knowledge of Git, GitHub Actions, and GitHub Workflows. Familiarity with tools such as Jenkins (advantageous). Engage in and improve the entire lifecycle of applications and cloud services from inception/design through deployment, operation, and refinement; design, develop, ship, and enable software and systems to increase reliability and efficiency. Lead development and tracking of SRE Error Budgets; lead development of SRE dashboards. Lead root cause investigations; proactively identify system anomalies and automation opportunities. Integrate into the software release cycle; work with developers to ensure releases are well designed, planned, implemented, released, and monitored. Automate time-consuming/manual processes. Assess current SRE solutions and define the SRE approach for products; collaborate with application development teams to design, implement, and improve SRE practices. Cloud platform expertise with strong knowledge of Infrastructure as Code (IAC), container orchestration, monitoring, and observability; solid understanding of AWS and related tooling. Proven ability to design and implement monitoring solutions that ensure system uptime and performance. Experience with AIOps principles and automation best practices. Excellent communication, collaboration, and problem-solving skills. Leverage industry-leading tools like Dynatrace, Splunk, and Elastic Stack for real-time monitoring and troubleshooting. Qualifications
Experience: 8 years in related roles. Skills: .Net, SQL, React, Dynatrace, Solarwinds DPA, AWS Cloud, Splunk, Elastic Stack, Python, scripting languages, Ansible Tower, Terraform. Strong knowledge of CI/CD, Git, GitHub Actions, and GitHub Workflows; familiarity with Jenkins is advantageous. Proven ability to design and implement monitoring and observability solutions; experience with AIOps. Seniority and Function
Seniority level:
Mid-Senior level Job function:
Information Technology Industry:
IT Services and IT Consulting Note: This summary reflects the information provided in the job description and keeps the original context intact.
#J-18808-Ljbffr