Vish Consulting Services, Inc
Job Description
As a Site Reliability Engineer (SRE), you will play a pivotal role in ensuring the reliability, scalability, and performance of our systems and applications. You will collaborate with cross-functional teams to define and monitor service level objectives (SLOs), automate operational tasks, and enhance the overall customer experience through robust testing and observability practices.
Key Responsibilities
Communicate reliability concerns and their impact to key stakeholders, product owners, and platform teams.
Design, develop, and maintain automated test frameworks using Cypress or Playwright for web and API testing.
Build and maintain integration with CI/CD pipelines using Azure DevOps and GitHub Actions.
Participate in on-call rotations and provide operational support for SRE-managed systems, including weekends and holidays.
Define and monitor SLOs and SLIs in collaboration with product and platform owners.
Track reliability performance and ensure systems meet agreed SLOs.
Promote reliability as a feature throughout the software development lifecycle.
Create dashboards and reports to communicate key metrics to stakeholders.
Contribute to documentation and runbooks based on operational experience and feedback.
Mandatory Skills
Automation Testing
: Experience designing and maintaining automated test frameworks using
Cypress
or
Playwright
.
Tech Stack Proficiency
: Strong experience testing applications built with
JavaScript
,
TypeScript
,
, and GraphQL . CI/CD Integration : Proven ability to integrate automated testing into CI/CD pipelines using Azure DevOps and GitHub Actions . Preferred Qualifications Bachelor's or Master's degree in Computer Science, Engineering, or related technical discipline. 5+ years of experience in designing and implementing large-scale production solutions. Experience in the airline industry is a plus. Nice to have Skills : Experience with observability and monitoring tools: Dynatrace , Mezmo (LogDNA) , BigPanda , Nucleus . Familiarity with Azure Cloud and Kubernetes environments. Strong understanding of SRE principles : monitoring, alerting, incident response, SLAs/SLOs/error budgets. Passion for improving processes and building reliable systems. Ability to conduct regression, performance, and security testing. Experience in post-deployment validation and release safety mechanisms. Participation in blameless postmortems and proposing reliability improvements.
, and GraphQL . CI/CD Integration : Proven ability to integrate automated testing into CI/CD pipelines using Azure DevOps and GitHub Actions . Preferred Qualifications Bachelor's or Master's degree in Computer Science, Engineering, or related technical discipline. 5+ years of experience in designing and implementing large-scale production solutions. Experience in the airline industry is a plus. Nice to have Skills : Experience with observability and monitoring tools: Dynatrace , Mezmo (LogDNA) , BigPanda , Nucleus . Familiarity with Azure Cloud and Kubernetes environments. Strong understanding of SRE principles : monitoring, alerting, incident response, SLAs/SLOs/error budgets. Passion for improving processes and building reliable systems. Ability to conduct regression, performance, and security testing. Experience in post-deployment validation and release safety mechanisms. Participation in blameless postmortems and proposing reliability improvements.