Logo
Alibaba Cloud

Cloud Infrastructure SRE

Alibaba Cloud, Sunnyvale, California, United States, 94087

Save Job

Join to apply for the

Cloud Infrastructure SRE

role at

Alibaba Cloud Join to apply for the

Cloud Infrastructure SRE

role at

Alibaba Cloud This range is provided by Alibaba Cloud. Your actual pay will be based on your skills and experience — talk with your recruiter to learn more. Base pay range

$104,400.00/yr - $171,000.00/yr Direct message the job poster from Alibaba Cloud Global Talent Acquisition Talent Sourcer

Alibaba Cloud Native Observability Team: Responsible for observability products including Alibaba Cloud Log Service (SLS), Application Real-Time Monitoring Service (ARMS), and Cloud Monitoring Service (CMS). We are committed to creating a real-time, intelligent, and large-scale observation and analysis platform for the future. This platform aims to build intelligent operations (AIOps), big data security, business monitoring and analysis services to accelerate digital innovation. Focus on Alibaba Cloud observability platforms (SLS/CMS/ARMS) in multinational cloud environments. Enhance system reliability and engineering delivery efficiency in these environments by implementing infrastructure automation, constructing SLO/SLI management systems, and optimizing scalable operations capabilities to ensure business continuity. Build Automated Operations Systems: Design a reliability engineering framework that includes change management, capacity planning, and self-healing mechanisms to enhance the stability and resilience of infrastructure (compute/storage/network) through Infrastructure as Code (IaC). Lead Standardized Observability Platform Delivery Framework Design: Establish risk assessment models and error budget mechanisms, and achieve quality control and efficiency optimization in the delivery process through automated toolchains. Develop SRE-Based Metrics System: Continuously optimize service health assessment models, achieve automated tracking of SLOs/SLIs, and drive decision-making with observability data. Experience: Over 3 years of experience in distributed systems reliability engineering, familiar with high-availability architecture design, and proficient in at least one of Python/Go/Java. Automation: Ability to convert operations experience into automated solutions, and familiar with various observability software and systems. SRE Practices: Familiar with core SRE practices (incident review/error budgeting/chaos engineering) and experienced in building automated risk control systems. The pay range for this position at commencement of employment is expected to be between $104,400 and $171,000/year. However, base pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. If hired, employee will be in an “at-will position” and the Company reserves the right to modify base salary (as well as any other discretionary payment or compensation program) at any time, including for reasons related to individual performance, Company or individual department/team performance, and market factors. Seniority level

Seniority level Entry level Employment type

Employment type Full-time Job function

Industries Software Development Referrals increase your chances of interviewing at Alibaba Cloud by 2x Sign in to set job alerts for “Site Reliability Engineer” roles.

Sunnyvale, CA $147,000.00-$208,000.00 4 hours ago Sunnyvale, CA $117,000.00-$173,000.00 4 hours ago Software Engineer, AI Platform - New Grad

Menlo Park, CA $117,000.00-$173,000.00 4 hours ago Site Reliability Engineer, AI/ML Platforms

Menlo Park, CA $147,000.00-$208,000.00 4 hours ago Software Engineer, Early Career 2025 Start

Reliability Engineer, Chassis Systems, Semi

Systems Engineer - CONOPS & Mission Operations

Fremont, CA $147,000.00-$208,000.00 4 hours ago Burlingame, CA $147,000.00-$208,000.00 4 hours ago Santa Clara, CA $168,000.00-$322,000.00 2 days ago New Grads 2025 - General Software Engineer

San Jose, CA $120,000.00-$165,000.00 3 months ago Foster City, CA $160,000.00-$190,000.00 2 months ago Site Reliability Engineer, AI Infrastructure

Senior Site Reliability Engineer - remote

Senior Site Reliability Engineer - DGX Cloud

Platform and EngOps Engineer - Cluster Operations

New Grads 2025 - Software Engineer, Algorithm

San Jose, CA $120,000.00-$165,000.00 8 months ago We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

#J-18808-Ljbffr