Microsoft
Site Reliability Engineer II - CTJ - Poly
Microsoft has an exciting opportunity for a
Site Reliability Engineer II
in the Cloud+AI Azure Data Team.
Microsoft’s Azure Data engineering team is leading the transformation of analytics in the world of data with products like databases, data integration, big data analytics, messaging & real-time analytics, and business intelligence. The products in the Azure Data portfolio include Microsoft Fabric, Azure SQL Databases, Azure Cosmos Databases, Azure PostgreSQL, Azure Data Factory, Azure Synapse Analytics, Azure Service Bus, Azure Event Grid, and Power BI. Our mission is to build a data platform for the age of AI, powering a new class of data-first applications and driving a data culture. This team will be responsible for deploying and operating our Azure Data services in a Secure Work Area, including the infrastructure for collaboration within an Air-Gapped environment.
In this role, you will have the opportunity to work with engineers who enable a broad set of Azure services to be consumed by internal and external customers in highly secure and regulated industries. The systems and software you build will be required to meet the security policy and assurance requirements of both public and private sector customers.
Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.
Responsibilities
Acts as a Designated Responsible Individual (DRI) on call to monitor service for degradation, downtime, or interruptions. Alerts stakeholders as to status and gains approval to restore system/product/service for simple problems. Responds within Service Level Agreement (SLA) timeframe. Escalates issues to appropriate owners.
Contributes to efforts to collect, classify, and analyze data with little oversight on a range of metrics (e.g., health of the system, where bugs might be occurring). Refines product features by escalating findings from analyses to inform decisions regarding the engineering of products.
Contributes to the development of automation within production and deployment of a complex product feature. Runs code in simulated or non-production environments to confirm functionality and error-free runtime for products with little to no oversight.
Contributes to efforts to ensure the correct processes are followed to achieve a high degree of security, privacy, safety, and accessibility. Checks for visible evidence to demonstrate compliance for product areas. Develops and holds an understanding of the implications of onboarding new technologies following expectations of compliance at Microsoft.
Remains current in skills by staying abreast of developments that improve availability, reliability, efficiency, observability, and performance of products while driving consistency in monitoring and operations at scale.
Applies best practices to reliably build code that is based on well-established methods. Follows best practices for product development and scaling to customer requirements and applies best practices for meeting scaling needs and performance expectations.
Maintains communication with key partners across the Microsoft ecosystem of engineers. Considers partners across teams and their end goals for products to drive and achieve desirable user experiences and fit the dynamic needs of partners/customers through product development.
Maintains operations of live service as issues arise on a rotational, on-call basis. Implements solutions and mitigations to more complex issues impacting performance or functionality of Live Site service and escalates as necessary. Reviews and writes issues postmortem and shares insights with the team.
Acts as a Designated Responsible Individual (DRI) and guides other engineers by developing and following the playbook, working on call to monitor system/product/service for degradation, downtime, or interruptions. Alerts stakeholders as to status and initiates actions to restore system/product/service for simple problems and complex problems when appropriate. Responds within SLA timeframe. Drives efforts to reduce incident volume and escalates issues to appropriate owners.
Drives efforts to integrate instrumentation for gathering telemetry data on system behavior such as performance, reliability, availability, usage, and safety mechanisms. Creates telemetry outputs like notifications or dashboards.
Drives efforts to collect, classify, and analyze data on a range of metrics and drives product refinement through data analytics and data integration.
Qualifications
Required/minimum qualifications
Bachelor’s Degree in Computer Science or related technical field AND 2+ years of technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python. OR equivalent experience.
Other Requirements
Security Clearance Requirements: The successful candidate must meet Microsoft, customer and/or government security screening requirements, including active U.S. Government Top Secret Clearance with access to Sensitive Compartmented Information (SCI) based on an SSBI with Polygraph, and related verification requirements. Failure to maintain or obtain the appropriate clearance may result in employment action.
Microsoft Cloud Background Check: Must pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
Citizenship & Citizenship Verification: This position requires verification of U.S. citizenship due to citizenship-based legal restrictions. Verification may include passport or other approved documents or verified US government clearance.
Additional or Preferred Qualifications
Master’s Degree in Computer Science or related field AND 3+ years of engineering experience with coding in the listed languages; OR Bachelor’s Degree AND 5+ years of experience; OR equivalent experience.
Experience working on large-scale distributed services with on-call responsibilities.
Ability to build and influence broadly towards common goals and priorities.
Ownership of end-to-end project lifecycle with solid project management and communication skills.
Microsoft will accept applications for the role until October 26, 2025.
Details
Seniority level: Not Applicable
Employment type: Full-time
Job function: Engineering and Information Technology
Industries: Software Development
Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance and/or a reasonable accommodation during the application process, read more about requesting accommodations.
Locations & Related Roles
Site Reliability Engineer - CTJ - Top Secret — Redmond, WA
Site Reliability Engineer, CTJ — Seattle/Redmond/Bellevue, WA
We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
#J-18808-Ljbffr
Site Reliability Engineer II
in the Cloud+AI Azure Data Team.
Microsoft’s Azure Data engineering team is leading the transformation of analytics in the world of data with products like databases, data integration, big data analytics, messaging & real-time analytics, and business intelligence. The products in the Azure Data portfolio include Microsoft Fabric, Azure SQL Databases, Azure Cosmos Databases, Azure PostgreSQL, Azure Data Factory, Azure Synapse Analytics, Azure Service Bus, Azure Event Grid, and Power BI. Our mission is to build a data platform for the age of AI, powering a new class of data-first applications and driving a data culture. This team will be responsible for deploying and operating our Azure Data services in a Secure Work Area, including the infrastructure for collaboration within an Air-Gapped environment.
In this role, you will have the opportunity to work with engineers who enable a broad set of Azure services to be consumed by internal and external customers in highly secure and regulated industries. The systems and software you build will be required to meet the security policy and assurance requirements of both public and private sector customers.
Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.
Responsibilities
Acts as a Designated Responsible Individual (DRI) on call to monitor service for degradation, downtime, or interruptions. Alerts stakeholders as to status and gains approval to restore system/product/service for simple problems. Responds within Service Level Agreement (SLA) timeframe. Escalates issues to appropriate owners.
Contributes to efforts to collect, classify, and analyze data with little oversight on a range of metrics (e.g., health of the system, where bugs might be occurring). Refines product features by escalating findings from analyses to inform decisions regarding the engineering of products.
Contributes to the development of automation within production and deployment of a complex product feature. Runs code in simulated or non-production environments to confirm functionality and error-free runtime for products with little to no oversight.
Contributes to efforts to ensure the correct processes are followed to achieve a high degree of security, privacy, safety, and accessibility. Checks for visible evidence to demonstrate compliance for product areas. Develops and holds an understanding of the implications of onboarding new technologies following expectations of compliance at Microsoft.
Remains current in skills by staying abreast of developments that improve availability, reliability, efficiency, observability, and performance of products while driving consistency in monitoring and operations at scale.
Applies best practices to reliably build code that is based on well-established methods. Follows best practices for product development and scaling to customer requirements and applies best practices for meeting scaling needs and performance expectations.
Maintains communication with key partners across the Microsoft ecosystem of engineers. Considers partners across teams and their end goals for products to drive and achieve desirable user experiences and fit the dynamic needs of partners/customers through product development.
Maintains operations of live service as issues arise on a rotational, on-call basis. Implements solutions and mitigations to more complex issues impacting performance or functionality of Live Site service and escalates as necessary. Reviews and writes issues postmortem and shares insights with the team.
Acts as a Designated Responsible Individual (DRI) and guides other engineers by developing and following the playbook, working on call to monitor system/product/service for degradation, downtime, or interruptions. Alerts stakeholders as to status and initiates actions to restore system/product/service for simple problems and complex problems when appropriate. Responds within SLA timeframe. Drives efforts to reduce incident volume and escalates issues to appropriate owners.
Drives efforts to integrate instrumentation for gathering telemetry data on system behavior such as performance, reliability, availability, usage, and safety mechanisms. Creates telemetry outputs like notifications or dashboards.
Drives efforts to collect, classify, and analyze data on a range of metrics and drives product refinement through data analytics and data integration.
Qualifications
Required/minimum qualifications
Bachelor’s Degree in Computer Science or related technical field AND 2+ years of technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python. OR equivalent experience.
Other Requirements
Security Clearance Requirements: The successful candidate must meet Microsoft, customer and/or government security screening requirements, including active U.S. Government Top Secret Clearance with access to Sensitive Compartmented Information (SCI) based on an SSBI with Polygraph, and related verification requirements. Failure to maintain or obtain the appropriate clearance may result in employment action.
Microsoft Cloud Background Check: Must pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
Citizenship & Citizenship Verification: This position requires verification of U.S. citizenship due to citizenship-based legal restrictions. Verification may include passport or other approved documents or verified US government clearance.
Additional or Preferred Qualifications
Master’s Degree in Computer Science or related field AND 3+ years of engineering experience with coding in the listed languages; OR Bachelor’s Degree AND 5+ years of experience; OR equivalent experience.
Experience working on large-scale distributed services with on-call responsibilities.
Ability to build and influence broadly towards common goals and priorities.
Ownership of end-to-end project lifecycle with solid project management and communication skills.
Microsoft will accept applications for the role until October 26, 2025.
Details
Seniority level: Not Applicable
Employment type: Full-time
Job function: Engineering and Information Technology
Industries: Software Development
Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance and/or a reasonable accommodation during the application process, read more about requesting accommodations.
Locations & Related Roles
Site Reliability Engineer - CTJ - Top Secret — Redmond, WA
Site Reliability Engineer, CTJ — Seattle/Redmond/Bellevue, WA
We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
#J-18808-Ljbffr