Logo
Insight Global

Site Reliability Engineer- Azure

Insight Global, Chandler, Arizona, United States, 85249

Save Job

Overview

Resource will be part of a team responsible for reliability and support of Container (Openshift) on-prem and external cloud (MS Azure/AWS/Google). This includes monitoring and troubleshooting alerts and incidents related to the platforms, and any required Incident and Problem Management. Application onboarding, troubleshooting, and support throughout the lifecycle. The role will require weekend on-call coverage and shift coverage as part of 24x7 Global Ops team. Resource will liaison regularly with teammates and shift leads. Additionally, as part of support will routinely interact with platform clients and vendors. Responsibilities

Monitor and troubleshoot alerts and incidents related to Container (Openshift) on-prem and in external clouds (Azure/AWS/Google). Perform Incident and Problem Management as required. Onboard applications and provide troubleshooting and support throughout their lifecycle. Communicate with team members, shift leads, platform clients, and vendors as part of the support process. Provide weekend on-call coverage and participate in 24x7 Global Ops shift coverage as needed. Qualifications

BS / MS degree in Computer Science or related technical field involving systems or equivalent practical experience. 5+ years of hands-on experience supporting Kubernetes / OpenShift / RKE / EKS container platforms. Experience with Python, Ansible, Golang, and shell scripting. Kubernetes / OpenShift / Terraform certifications are a plus. Strong experience in Compute, Storage, Network and Security services. Experience with monitoring tools like Prometheus and Dynatrace, and cloud-native tools like Azure Monitor and Log Analytics. Understanding of complex IAM infrastructure (Active Directory, Azure AD Connect, Azure AD, Ping Identity or other SSO solutions). Advanced knowledge of Linux OS, DNS, DHCP, Kerberos and Windows Authentication. Experience with CI/CD tools (git / Jenkins) and GitOps model. Excellent Linux/Windows system administration skills. Experience in container security and remediation. Systematic problem-solving, ownership, and ability to manage competing priorities. Excellent interpersonal, organizational and communication skills (written, verbal, and presentation). Proven ability to work independently and as part of a team with direct responsibilities. Experience in Openshift, RKE, CSP Kubernetes services such as AKS and EKS. Experience in Terraform, ArgoCD, Tekton, and K-native technologies. Experience with agile deployment methodologies (GitOps). Knowledge of various container runtimes and familiarity with the operator deployment pattern. Experience working in a highly available multi-datacenter environment. Experience with monitoring tools such as Prometheus, Splunk, Dynatrace, Sysdig, or similar tools. Understanding of cost management, inventory management, FinOps model. Equal Opportunity

We are a company committed to creating diverse and inclusive environments where people can bring their full, authentic selves to work every day. We are an equal opportunity/affirmative action employer that believes everyone matters. Qualified candidates will receive consideration for employment regardless of their race, color, ethnicity, religion, sex (including pregnancy), sexual orientation, gender identity and expression, marital status, national origin, ancestry, genetic factors, age, disability, protected veteran status, military or uniformed service member status, or any other status or characteristic protected by applicable laws, regulations, and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or recruiting process, please send a request to HR@insightglobal.com. To learn more about how we collect, keep, and process your private information, please review Insight Global's Workforce Privacy Policy: https://insightglobal.com/workforce-privacy-policy/.

#J-18808-Ljbffr