Motion Industries
Birmingham, AL, USA - Full time
Site Reliability Engineer III
Under limited supervision, the Site Reliability Engineer III is responsible for improving system reliability and resilience. This role focuses on building automation to reduce manual effort and prevent service-impacting incidents. The SRE combines software and systems engineering to build and support large-scale, distributed, fault-tolerant systems. This role ensures that critical platforms are available, reliable, and able to support a fast rate of improvement. The SRE will enhance and support cloud-based transformations and is focused on pushing capabilities forward, staying ahead of customer needs, and innovating for continuous improvement. The SRE provides operational support and engineering for multiple large-scale distributed software applications. JOB DUTIES
Gathers and analyzes metrics from monitoring platforms to assist in performance tuning and fault tolerance. Partners with development teams to improve services through testing and release procedures. Participates in system design, platform management and capacity planning. Balances feature development speed and reliability with service-level objectives. Works closely with the incident response team and restoring service to normal operation. Understands debugging and applying troubleshooting skills. Investigates, blocks and rate-limits unwanted traffic. Utilizes monitoring systems and dashboards for proactive changes and alerting. Establishes continuous process improvement cycles where the process, performance, and supporting technologies are reviewed and enhanced where applicable. Performs other duties as assigned. EDUCATION & EXPERIENCE
Typically requires a bachelor's degree and five (5) or more years of related experience or an equivalent combination. KNOWLEDGE, SKILLS, ABILITIES
Understanding of Kubernetes, containers, clusters, and elastic scalability. Expertise in SRE principles. Mindset of continually finding ways to drive scalability, stability, and performance. Cloud Services experience with Google Cloud Platform (GCP). Experience with API, service-based or microservice-based architecture. Proficiency in infrastructure, network, database, operating systems, or security troubleshooting and remediation. Architecture-level knowledge of Windows and Linux and Infrastructure systems. Experience with production deployment, monitoring, and operational support for enterprise-class applications (Dynatrace a plus). Experience working with Continuous Integration/ Continuous Deployment tools. Experience in performance diagnostics, capacity planning, performance architecture design, performance tuning, and performance monitoring. A strong mix of software engineering and operational support skills. Knowledge of web technologies - HTTP, proxy, java, etc. Experience with Azure DevOps (ADO), Dynatrace, Prometheus, Terraform and Grafana. Motion offers an excellent benefits package which includes options for healthcare coverage, 401(k), tuition reimbursement, vacation, sick, and holiday pay. GPC conducts its business without regard to sex, race, creed, color, religion, marital status, national origin, citizenship status, age, pregnancy, sexual orientation, gender identity or expression, genetic information, disability, military status, status as a veteran, or any other protected characteristic. GPC's policy is to recruit, hire, train, promote, assign, transfer and terminate employees based on their own ability, achievement, experience and conduct and other legitimate business reasons. Where permitted by applicable law, successful applicants must be fully vaccinated against COVID-19 prior to start date. COVID-19 vaccination is a condition of employment, subject to an approved accommodation, and proof of vaccination will be required on or prior to start date. Equal employment opportunity, including veterans and individuals with disabilities.
#J-18808-Ljbffr
Under limited supervision, the Site Reliability Engineer III is responsible for improving system reliability and resilience. This role focuses on building automation to reduce manual effort and prevent service-impacting incidents. The SRE combines software and systems engineering to build and support large-scale, distributed, fault-tolerant systems. This role ensures that critical platforms are available, reliable, and able to support a fast rate of improvement. The SRE will enhance and support cloud-based transformations and is focused on pushing capabilities forward, staying ahead of customer needs, and innovating for continuous improvement. The SRE provides operational support and engineering for multiple large-scale distributed software applications. JOB DUTIES
Gathers and analyzes metrics from monitoring platforms to assist in performance tuning and fault tolerance. Partners with development teams to improve services through testing and release procedures. Participates in system design, platform management and capacity planning. Balances feature development speed and reliability with service-level objectives. Works closely with the incident response team and restoring service to normal operation. Understands debugging and applying troubleshooting skills. Investigates, blocks and rate-limits unwanted traffic. Utilizes monitoring systems and dashboards for proactive changes and alerting. Establishes continuous process improvement cycles where the process, performance, and supporting technologies are reviewed and enhanced where applicable. Performs other duties as assigned. EDUCATION & EXPERIENCE
Typically requires a bachelor's degree and five (5) or more years of related experience or an equivalent combination. KNOWLEDGE, SKILLS, ABILITIES
Understanding of Kubernetes, containers, clusters, and elastic scalability. Expertise in SRE principles. Mindset of continually finding ways to drive scalability, stability, and performance. Cloud Services experience with Google Cloud Platform (GCP). Experience with API, service-based or microservice-based architecture. Proficiency in infrastructure, network, database, operating systems, or security troubleshooting and remediation. Architecture-level knowledge of Windows and Linux and Infrastructure systems. Experience with production deployment, monitoring, and operational support for enterprise-class applications (Dynatrace a plus). Experience working with Continuous Integration/ Continuous Deployment tools. Experience in performance diagnostics, capacity planning, performance architecture design, performance tuning, and performance monitoring. A strong mix of software engineering and operational support skills. Knowledge of web technologies - HTTP, proxy, java, etc. Experience with Azure DevOps (ADO), Dynatrace, Prometheus, Terraform and Grafana. Motion offers an excellent benefits package which includes options for healthcare coverage, 401(k), tuition reimbursement, vacation, sick, and holiday pay. GPC conducts its business without regard to sex, race, creed, color, religion, marital status, national origin, citizenship status, age, pregnancy, sexual orientation, gender identity or expression, genetic information, disability, military status, status as a veteran, or any other protected characteristic. GPC's policy is to recruit, hire, train, promote, assign, transfer and terminate employees based on their own ability, achievement, experience and conduct and other legitimate business reasons. Where permitted by applicable law, successful applicants must be fully vaccinated against COVID-19 prior to start date. COVID-19 vaccination is a condition of employment, subject to an approved accommodation, and proof of vaccination will be required on or prior to start date. Equal employment opportunity, including veterans and individuals with disabilities.
#J-18808-Ljbffr