Mastech Digital

Site Reliability Engineer (Phoenix)

Mastech Digital, Phoenix, Arizona, United States, 85003

Role Overview Were seeking a

Site Reliability Engineer (SRE)

with strong full-stack development expertise and hands-on experience in observability, automation, and reliability engineering. The ideal candidate will design monitoring solutions, optimize system performance, and drive reliability across distributed applications and infrastructure.

Must-Have Technical Skills (Level 3

57+ Years) Full Stack Development:

Strong ability to navigate across front-end, back-end, and infrastructure layers for debugging and optimization. Observability:

Deep understanding of logs, metrics, and traces for system monitoring and diagnostics. Monitoring & Analysis Tools: Dynatrace BigPanda Evolven ThousandEyes Nice to Have Skills Advanced experience with

Grafana

or

Kibana

for analytics and visualization. Familiarity with

cloud platforms (AWS/Azure/GCP)

and infrastructure-as-code tools. Key Responsibilities Define and implement standardized methods to collect and analyze

logs, traces, and metrics

across systems and applications. Develop

dashboards and monitoring frameworks

to improve visibility into system health and performance. Collaborate with development teams to enhance

service reliability , optimize deployments, and streamline release processes. Conduct

root cause analysis , performance tuning, and fault detection using observability tools. Participate in

system design reviews, platform management, and capacity planning . Build

automation pipelines

to reduce manual operations, improve efficiency, and ensure sustainable systems. Establish and maintain

Service Level Indicators (SLIs)

and

Service Level Objectives (SLOs)

to ensure uptime and performance standards are met. Education Bachelors degree preferred , but not required (Computer Science, Engineering, or related field).