Brooksource
Get AI-powered advice on this job and more exclusive features.
Responsibilities
Ensure key stakeholders, product owners, and platform owners are informed of reliability concerns and their potential impact on the customers' experience.
Design, code, test and deliver solutions to automate manual operation (i.e., “TOIL”).
Participate in operations support and on-call rotation shifts (could include weekends and holidays), for SRE supported systems and products with a focus on implementing long-term solutions for any problems identified.
Collaborate with stakeholders such as product and platform owners, to define service level objectives (SLOs), and service-level indicators (SLIs) for system operations focused on the critical features of the customers journey and experience.
Track and manage reliability performance against agreed SLOs, in partnership with IT monitoring teams or other stakeholders, and ensure systems continue to meet SLOs over time.
Provide expert knowledge on reliability approaches, to ensure our organization achieves its goals and roadmap for reliability.
Champion reliability being treated as a feature in products and platforms and promote the concept across all phases of the software development life cycle.
Create dashboards and reports to communicate key metrics, to product owners and key stakeholders.
Contribute to documentation and runbooks for owned applications based on operational experience, user feedback, and application changes.
Qualifications
Able to design, develop, and maintain automated test frameworks using Cypress/Playwright for web and API testing.
Able to build and maintain integration with CI/CD pipelines to ensure reliable automated testing using Azure Dev Ops and GitHub Actions.
Bachelor's degree in Computer Science, Computer Engineering, Technology, Information Systems (CIS/MIS), Engineering or related technical discipline, or equivalent experience/training.
At least 1 year of experience designing, developing, and implementing large-scale solutions in production environments.
Master's degree in Computer Science, Computer Engineering, Technology, Information Systems (CIS/MIS), Engineering or related technical discipline, or equivalent experience/training (preferred).
Airline Industry experience (preferred).
We will consider junior developers who can demonstrate passion for development and processes.
Nice-to-Have Qualifications (what you will be learning)
Dynatrace (APM/monitoring)
Nucleus (security & vulnerability management)
Understanding of production observability and incident management concepts.
Excited to learn, grow their SRE skills, and take ownership across both testing and reliability domains.
A passion for improving processes and building reliable systems.
Proven ability to work independently and take initiative with minimal guidance.
Strong background in quality engineering, with a solid understanding of automation best practices.
Familiarity with SRE concepts such as monitoring, alerting, incident response.
Familiarity with navigating and managing resources in Azure cloud and Kubernetes environments.
Define and execute comprehensive test strategies for new features and services.
Implement and manage CI/CD workflows using GitHub Actions or other GitHub-integrated tools.
Build and maintain integration with CI/CD pipelines to ensure reliable automated testing using Azure Dev Ops and GitHub actions.
Conduct regression, performance, and security testing.
Collaborate with developers and product managers to ensure high-quality releases.
Participate in the SRE team’s daily operations, including system monitoring, alerting, and incident response.
Implement post-deployment validation, health checks, and release safety mechanisms.
Help define and monitor SLAs, SLOs, and error budgets.
Contribute to reliability tooling, observability improvements, and performance diagnostics.
Participate in blameless postmortems and propose solutions to improve system stability.
Seniority level
Associate
Employment type
Contract
Job function
Industries
Airlines and Aviation
We’re not including referrals or external links here.
We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
#J-18808-Ljbffr
Responsibilities
Ensure key stakeholders, product owners, and platform owners are informed of reliability concerns and their potential impact on the customers' experience.
Design, code, test and deliver solutions to automate manual operation (i.e., “TOIL”).
Participate in operations support and on-call rotation shifts (could include weekends and holidays), for SRE supported systems and products with a focus on implementing long-term solutions for any problems identified.
Collaborate with stakeholders such as product and platform owners, to define service level objectives (SLOs), and service-level indicators (SLIs) for system operations focused on the critical features of the customers journey and experience.
Track and manage reliability performance against agreed SLOs, in partnership with IT monitoring teams or other stakeholders, and ensure systems continue to meet SLOs over time.
Provide expert knowledge on reliability approaches, to ensure our organization achieves its goals and roadmap for reliability.
Champion reliability being treated as a feature in products and platforms and promote the concept across all phases of the software development life cycle.
Create dashboards and reports to communicate key metrics, to product owners and key stakeholders.
Contribute to documentation and runbooks for owned applications based on operational experience, user feedback, and application changes.
Qualifications
Able to design, develop, and maintain automated test frameworks using Cypress/Playwright for web and API testing.
Able to build and maintain integration with CI/CD pipelines to ensure reliable automated testing using Azure Dev Ops and GitHub Actions.
Bachelor's degree in Computer Science, Computer Engineering, Technology, Information Systems (CIS/MIS), Engineering or related technical discipline, or equivalent experience/training.
At least 1 year of experience designing, developing, and implementing large-scale solutions in production environments.
Master's degree in Computer Science, Computer Engineering, Technology, Information Systems (CIS/MIS), Engineering or related technical discipline, or equivalent experience/training (preferred).
Airline Industry experience (preferred).
We will consider junior developers who can demonstrate passion for development and processes.
Nice-to-Have Qualifications (what you will be learning)
Dynatrace (APM/monitoring)
Nucleus (security & vulnerability management)
Understanding of production observability and incident management concepts.
Excited to learn, grow their SRE skills, and take ownership across both testing and reliability domains.
A passion for improving processes and building reliable systems.
Proven ability to work independently and take initiative with minimal guidance.
Strong background in quality engineering, with a solid understanding of automation best practices.
Familiarity with SRE concepts such as monitoring, alerting, incident response.
Familiarity with navigating and managing resources in Azure cloud and Kubernetes environments.
Define and execute comprehensive test strategies for new features and services.
Implement and manage CI/CD workflows using GitHub Actions or other GitHub-integrated tools.
Build and maintain integration with CI/CD pipelines to ensure reliable automated testing using Azure Dev Ops and GitHub actions.
Conduct regression, performance, and security testing.
Collaborate with developers and product managers to ensure high-quality releases.
Participate in the SRE team’s daily operations, including system monitoring, alerting, and incident response.
Implement post-deployment validation, health checks, and release safety mechanisms.
Help define and monitor SLAs, SLOs, and error budgets.
Contribute to reliability tooling, observability improvements, and performance diagnostics.
Participate in blameless postmortems and propose solutions to improve system stability.
Seniority level
Associate
Employment type
Contract
Job function
Industries
Airlines and Aviation
We’re not including referrals or external links here.
We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
#J-18808-Ljbffr