United Airlines
Description
Job overview and responsibilities
As a Sr. Engineer, you will be a self‑starter who is seen as a technical expert in Observability Engineering, responsible for building high performance next generation observability systems. This will be accomplished with a combination of general application/environment understanding and building new engineering capabilities to improve and enhance existing distributed solutions to solve critical observability engineering problems for both cloud and on‑premises. You will also participate in a 24×7 on‑call rotation and be accountable for all aspects of IT service delivery, including incident, problem, and change management and ensure adherence to these processes, from coding to scaling applications, performance tuning and post‑mortem analysis. Lastly, as the Sr. Engineer, you will drive thought leadership and function as an interim leader in the absence of the Sr. Manager, partnering with SRE and DevOps teams to define and implement observability and monitoring practices during the SDLC. The ideal candidate has deep technical expertise in Python/Java coding, Kubernetes and building cloud observability platform solutions.
Collaborate proactively with interdisciplinary teams across the IT department to identify and mitigate unplanned application downtime and engage in thorough root cause analysis post‑outage, improving system designs for automated troubleshooting
Partner with Application Development, Site Reliability Engineering and DevOps teams to continuously refine application instrumentation in order to maximize reliability and availability, enforcing best practices and enhancing system optimization, defining and implementing SLI, SLO and SLA
Continuously build upon knowledge of the assigned portfolio of applications to understand architecture, usage patterns, performance trends, outages, and business impact, creating strategies to proactively identify and report application performance problems and failures, detecting and preventing issues to mitigate operational risks
Be responsible for building observability solutions toward the long‑term goals, being a strong champion of observability principles
Consistently share best practices and improve processes within and across teams
Continuously monitor the production environment availability and take a holistic view of system health, service performance and availability, including real user monitoring, logging, distributed tracing and alerting for cloud and on‑premise systems
Engage with project teams to guarantee that operational monitoring and instrumentation requirements are addressed by defining and implementing SLI, SLO and SLA during application deployment
Develop expert‑level knowledge of observability toolsets to maintain and enhance our observability practices and solutions, improving the reliability, stability, and performance of the digital platforms by driving the implementation of fully automated telemetry capabilities to improve problem identification and service restoration through automated alerting and response systems with intelligent, self‑healing capabilities
Serves as mentor to other team members to provide support and guidance in performing core functions, and in championing the adoption of observability practices
Qualifications What’s needed to succeed (Minimum Qualifications):
Bachelor’s degree in computer science, information technology, or relevant field
4+ years in an IT organization with experience in observability and monitoring solutions
4+ years of experience with service management for cloud in a medium to large IT organization
Experience with distributed storage technologies such as EC2 (Elastic Compute Cloud), S3 (Simple Storage Service), RDS (Relational Database Service), VPC (Virtual Private Cloud), Lambda, and CloudFormation
Proficiency with dynamic resource management frameworks (Kubernetes, Yarn)
Experience with AWS networking services like VPC, Route 53, and CloudFront, with understanding of cloud concepts like IaaS, PaaS, and SaaS
Strong knowledge of Dynatrace APM (Application Performance Monitoring), including setup, configuration, and optimization. Familiarity with Dynatrace’s AI‑driven analytics capabilities, and Dynatrace extensions and plugins
Proficiency with DevOps practices and tools (CI/CD pipelines, Jenkins)
Ability to code (structured and OOP) using one or more high‑level languages, such as Python, Java, C# or JavaScript
Understanding of API management and integration services like API Gateway, and experience with RESTful and SOAP APIs
Dynatrace Associate Certification or AWS Certified DevOps Engineer required
Must be legally authorized to work in the United States for any employer without sponsorship
Successful completion of interview required to meet job qualification
Reliable, punctual attendance is an essential function of the position
What will help you propel from the pack (Preferred Qualifications):
3+ years of experience with DevOps in a medium to large IT organization
2+ years of proven experience using Dynatrace, DQL and large enterprise experience is a plus
1-2 years of experience leading small projects or teams
#J-18808-Ljbffr
Collaborate proactively with interdisciplinary teams across the IT department to identify and mitigate unplanned application downtime and engage in thorough root cause analysis post‑outage, improving system designs for automated troubleshooting
Partner with Application Development, Site Reliability Engineering and DevOps teams to continuously refine application instrumentation in order to maximize reliability and availability, enforcing best practices and enhancing system optimization, defining and implementing SLI, SLO and SLA
Continuously build upon knowledge of the assigned portfolio of applications to understand architecture, usage patterns, performance trends, outages, and business impact, creating strategies to proactively identify and report application performance problems and failures, detecting and preventing issues to mitigate operational risks
Be responsible for building observability solutions toward the long‑term goals, being a strong champion of observability principles
Consistently share best practices and improve processes within and across teams
Continuously monitor the production environment availability and take a holistic view of system health, service performance and availability, including real user monitoring, logging, distributed tracing and alerting for cloud and on‑premise systems
Engage with project teams to guarantee that operational monitoring and instrumentation requirements are addressed by defining and implementing SLI, SLO and SLA during application deployment
Develop expert‑level knowledge of observability toolsets to maintain and enhance our observability practices and solutions, improving the reliability, stability, and performance of the digital platforms by driving the implementation of fully automated telemetry capabilities to improve problem identification and service restoration through automated alerting and response systems with intelligent, self‑healing capabilities
Serves as mentor to other team members to provide support and guidance in performing core functions, and in championing the adoption of observability practices
Qualifications What’s needed to succeed (Minimum Qualifications):
Bachelor’s degree in computer science, information technology, or relevant field
4+ years in an IT organization with experience in observability and monitoring solutions
4+ years of experience with service management for cloud in a medium to large IT organization
Experience with distributed storage technologies such as EC2 (Elastic Compute Cloud), S3 (Simple Storage Service), RDS (Relational Database Service), VPC (Virtual Private Cloud), Lambda, and CloudFormation
Proficiency with dynamic resource management frameworks (Kubernetes, Yarn)
Experience with AWS networking services like VPC, Route 53, and CloudFront, with understanding of cloud concepts like IaaS, PaaS, and SaaS
Strong knowledge of Dynatrace APM (Application Performance Monitoring), including setup, configuration, and optimization. Familiarity with Dynatrace’s AI‑driven analytics capabilities, and Dynatrace extensions and plugins
Proficiency with DevOps practices and tools (CI/CD pipelines, Jenkins)
Ability to code (structured and OOP) using one or more high‑level languages, such as Python, Java, C# or JavaScript
Understanding of API management and integration services like API Gateway, and experience with RESTful and SOAP APIs
Dynatrace Associate Certification or AWS Certified DevOps Engineer required
Must be legally authorized to work in the United States for any employer without sponsorship
Successful completion of interview required to meet job qualification
Reliable, punctual attendance is an essential function of the position
What will help you propel from the pack (Preferred Qualifications):
3+ years of experience with DevOps in a medium to large IT organization
2+ years of proven experience using Dynatrace, DQL and large enterprise experience is a plus
1-2 years of experience leading small projects or teams
#J-18808-Ljbffr