ServiceNow
Senior Staff DevOps Engineer - Cloud Analytics & FinOps Engineering Platform
ServiceNow, Pleasanton, California, United States, 94566
Senior Staff DevOps Engineer - Cloud Analytics & FinOps Engineering Platform
It all started in sunny San Diego, California in 2004 when a visionary engineer, Fred Luddy, saw the potential to transform how we work. Fast forward to today ServiceNow stands as a global market leader, bringing innovative AI-enhanced technology to over 8,100 customers, including 85% of the Fortune 500. Our intelligent cloud-based platform seamlessly connects people, systems, and processes to empower organizations to find smarter, faster, and better ways to work. But this is just the beginning of our journey. Join us as we pursue our purpose to make the world work better for everyone. Job Description
Join the Global Cloud Services organization as the founding member of our Cloud Analytics & FinOps Engineering Platform team. You will be instrumental in establishing the technical foundation and architectural direction for ServiceNow's next-generation FinOps governance platform. We are building a modern, secure, and highly scalable multi-cloud data platform infrastructure powering next-generation analytics to support ServiceNow's Cloud and AI growth. As our Senior Staff DevOps Engineer for Cloud Analytics & FinOps Engineering Platform, you will architect, secure, and operationalize our hybrid cloud data platform infrastructure spanning AWS, GCP, Azure, and on-premises systems. You will have ownership over CI/CD pipelines, infrastructure-as-code, platform security, cost optimization, observability, and data source integrations across our complex ecosystem while navigating ServiceNow's enterprise infrastructure standards and compliance requirements. This is a unique opportunity to build enterprise-grade platform infrastructure from the ground up, establish DevOps best practices for modern data platforms, and work within a Fortune 500 enterprise environment with global scale requirements. What you get to do in this role: Platform Infrastructure & Architecture
Design and implement secure, scalable Kubernetes clusters across AWS EKS, GCP GKE, and Azure AKS supporting complex data platform workloads. Architect hybrid cloud infrastructure with unified management and governance, building infrastructure-as-code solutions using Terraform, AWS CDK, and CloudFormation for repeatable deployments. Establish multi-cloud networking including VPC design, cross-cloud connectivity, Transit Gateway configurations, and secure service mesh implementations while navigating ServiceNow enterprise standards and approval processes. Security & Compliance
Implement comprehensive security frameworks across multi-cloud data platform stack adhering to enterprise security standards. Design identity and access management across cloud providers following principle of least privilege, orchestrate secrets management using cloud-native solutions, and establish security scanning for container images and infrastructure. Ensure compliance with SOC2, FedRAMP, and regulatory requirements while working with security teams to implement platform controls and data governance. CI/CD Pipeline Engineering & GitOps
Design sophisticated CI/CD pipelines using Jenkins, GitHub Actions, TeamCity, and Argo CD for GitOps workflows. Manage artifact repositories with automated image scanning and promotion, create Helm charts for complex data platform services (Trino, Airflow, Lightdash, Grafana), and establish automated testing pipelines for infrastructure changes with drift detection and remediation. Observability & Site Reliability Engineering
Architect comprehensive monitoring using Grafana, Prometheus, and CloudWatch with advanced alerting and incident response frameworks. Design SLIs/SLOs/SLAs for data platform services with error budget management, establish SRE practices including toil reduction and capacity planning, and create operational dashboards for platform health and performance metrics. Implement automated remediation workflows and capacity forecasting with predictive analytics. Data Platform Operations & Integration
Design secure data ingestion pipelines from disparate systems across multi-cloud and on-premises environments. Implement data source connectors for billing systems, ServiceNow internal systems, SaaS platforms, and ML platforms. Manage hybrid cloud connectivity and orchestrate complex data workflows using Apache Airflow with high availability across multiple cloud environments. Platform Automation & Developer Experience
Implement automated scaling and resource management across cloud providers. Establish Cloud Development Environment (CDE) platform using Coder to provision on-demand development workspaces via Terraform templates for global distributed teams, with enterprise compliance and cost optimization. Enterprise Navigation & Global Operations
Work within ServiceNow enterprise processes for technology approvals and infrastructure changes. Mentor junior engineers across global time zones on SRE best practices, establish operational runbooks for 24/7 platform support with automated incident response, and implement SRE organizational practices including error budget policies and reliability reviews. Qualifications
To be successful in this role you have: Experience in leveraging or critically thinking about how to integrate AI into work processes, decision-making, or problem-solving. This may include using AI-powered tools, automating workflows, analyzing AI-driven insights, or exploring AI's potential impact on the function or industry. 10+ years of DevOps/Platform engineering experience with large-scale distributed systems in enterprise environments Expert-level Kubernetes knowledge across multiple cloud providers (EKS, GKE, AKS) including service mesh and cluster management Multi-cloud expertise across AWS, GCP, and Azure with deep understanding of platform strengths and cost models Advanced Infrastructure-as-Code experience with Terraform, CloudFormation, and AWS CDK Proven CI/CD pipeline management using GitHub Actions, Jenkins, Argo CD, and GitOps workflows in enterprise environments Strong security background with cloud security best practices and compliance frameworks (SOC2, FedRAMP) Expertise in network security for cloud and Kubernetes environments, including VPC design, zero-trust networking, security policies, firewall rules, VPNs, and intrusion detection/prevention systems Enterprise navigation skills with large organization processes and cross-team collaboration Bachelor's degree in Computer Science, Engineering, or related technical field Full professional proficiency in English Technical Expertise: Multi-Cloud & Container Orchestration: Docker, Kubernetes, and Helm across AWS, GCP, and Azure at enterprise scale with hybrid cloud networking including VPN, Direct Connect, ExpressRoute, and cross-cloud connectivity. DevOps & Automation: CI/CD automation, GitOps workflows, Infrastructure-as-Code mastery, and scripting proficiency in Python, Bash, and Go for infrastructure management and toil reduction. Data Platform Operations: Experience with Trino/Presto, Apache Airflow, dbt, analytics databases, and Cloud Development Environment platforms including Coder for workspace provisioning. Site Reliability Engineering: SLI/SLO design, error budgets, chaos engineering, automated remediation, monitoring with Grafana/Prometheus/ELK stack, and performance engineering for distributed systems. Database & Integration: PostgreSQL operations, on-premises integration with legacy systems, and hybrid cloud architectures across cloud and on-prem environments. Security & Compliance Expertise: Multi-cloud security architecture with expertise in cloud-native security services, identity and access management across providers (AWS IAM, GCP IAM, Azure AD), enterprise compliance frameworks (SOC2, FedRAMP, PCI-DSS), secrets management, security scanning, and data privacy frameworks including GDPR and CCPA compliance. FinOps & Cost Management: Multi-cloud cost optimization strategies, resource rightsizing with automated scaling, cost allocation models for multi-tenant platforms, FinOps tooling with enterprise budget constraints, and cloud cost negotiation experience with procurement teams. Preferred Qualifications: Data engineering background with modern data stack technologies Service mesh experience with Istio, Linkerd, or cloud-native solutions Enterprise platform experience at Fortune 500 companies Global team leadership across multiple time zones SRE certification or formal training from Google, AWS, or similar programs Chaos engineering experience with tools like Chaos Monkey, Litmus, or Gremlin Open-source contributions to DevOps, Kubernetes, or SRE tools Multi-cloud certifications
It all started in sunny San Diego, California in 2004 when a visionary engineer, Fred Luddy, saw the potential to transform how we work. Fast forward to today ServiceNow stands as a global market leader, bringing innovative AI-enhanced technology to over 8,100 customers, including 85% of the Fortune 500. Our intelligent cloud-based platform seamlessly connects people, systems, and processes to empower organizations to find smarter, faster, and better ways to work. But this is just the beginning of our journey. Join us as we pursue our purpose to make the world work better for everyone. Job Description
Join the Global Cloud Services organization as the founding member of our Cloud Analytics & FinOps Engineering Platform team. You will be instrumental in establishing the technical foundation and architectural direction for ServiceNow's next-generation FinOps governance platform. We are building a modern, secure, and highly scalable multi-cloud data platform infrastructure powering next-generation analytics to support ServiceNow's Cloud and AI growth. As our Senior Staff DevOps Engineer for Cloud Analytics & FinOps Engineering Platform, you will architect, secure, and operationalize our hybrid cloud data platform infrastructure spanning AWS, GCP, Azure, and on-premises systems. You will have ownership over CI/CD pipelines, infrastructure-as-code, platform security, cost optimization, observability, and data source integrations across our complex ecosystem while navigating ServiceNow's enterprise infrastructure standards and compliance requirements. This is a unique opportunity to build enterprise-grade platform infrastructure from the ground up, establish DevOps best practices for modern data platforms, and work within a Fortune 500 enterprise environment with global scale requirements. What you get to do in this role: Platform Infrastructure & Architecture
Design and implement secure, scalable Kubernetes clusters across AWS EKS, GCP GKE, and Azure AKS supporting complex data platform workloads. Architect hybrid cloud infrastructure with unified management and governance, building infrastructure-as-code solutions using Terraform, AWS CDK, and CloudFormation for repeatable deployments. Establish multi-cloud networking including VPC design, cross-cloud connectivity, Transit Gateway configurations, and secure service mesh implementations while navigating ServiceNow enterprise standards and approval processes. Security & Compliance
Implement comprehensive security frameworks across multi-cloud data platform stack adhering to enterprise security standards. Design identity and access management across cloud providers following principle of least privilege, orchestrate secrets management using cloud-native solutions, and establish security scanning for container images and infrastructure. Ensure compliance with SOC2, FedRAMP, and regulatory requirements while working with security teams to implement platform controls and data governance. CI/CD Pipeline Engineering & GitOps
Design sophisticated CI/CD pipelines using Jenkins, GitHub Actions, TeamCity, and Argo CD for GitOps workflows. Manage artifact repositories with automated image scanning and promotion, create Helm charts for complex data platform services (Trino, Airflow, Lightdash, Grafana), and establish automated testing pipelines for infrastructure changes with drift detection and remediation. Observability & Site Reliability Engineering
Architect comprehensive monitoring using Grafana, Prometheus, and CloudWatch with advanced alerting and incident response frameworks. Design SLIs/SLOs/SLAs for data platform services with error budget management, establish SRE practices including toil reduction and capacity planning, and create operational dashboards for platform health and performance metrics. Implement automated remediation workflows and capacity forecasting with predictive analytics. Data Platform Operations & Integration
Design secure data ingestion pipelines from disparate systems across multi-cloud and on-premises environments. Implement data source connectors for billing systems, ServiceNow internal systems, SaaS platforms, and ML platforms. Manage hybrid cloud connectivity and orchestrate complex data workflows using Apache Airflow with high availability across multiple cloud environments. Platform Automation & Developer Experience
Implement automated scaling and resource management across cloud providers. Establish Cloud Development Environment (CDE) platform using Coder to provision on-demand development workspaces via Terraform templates for global distributed teams, with enterprise compliance and cost optimization. Enterprise Navigation & Global Operations
Work within ServiceNow enterprise processes for technology approvals and infrastructure changes. Mentor junior engineers across global time zones on SRE best practices, establish operational runbooks for 24/7 platform support with automated incident response, and implement SRE organizational practices including error budget policies and reliability reviews. Qualifications
To be successful in this role you have: Experience in leveraging or critically thinking about how to integrate AI into work processes, decision-making, or problem-solving. This may include using AI-powered tools, automating workflows, analyzing AI-driven insights, or exploring AI's potential impact on the function or industry. 10+ years of DevOps/Platform engineering experience with large-scale distributed systems in enterprise environments Expert-level Kubernetes knowledge across multiple cloud providers (EKS, GKE, AKS) including service mesh and cluster management Multi-cloud expertise across AWS, GCP, and Azure with deep understanding of platform strengths and cost models Advanced Infrastructure-as-Code experience with Terraform, CloudFormation, and AWS CDK Proven CI/CD pipeline management using GitHub Actions, Jenkins, Argo CD, and GitOps workflows in enterprise environments Strong security background with cloud security best practices and compliance frameworks (SOC2, FedRAMP) Expertise in network security for cloud and Kubernetes environments, including VPC design, zero-trust networking, security policies, firewall rules, VPNs, and intrusion detection/prevention systems Enterprise navigation skills with large organization processes and cross-team collaboration Bachelor's degree in Computer Science, Engineering, or related technical field Full professional proficiency in English Technical Expertise: Multi-Cloud & Container Orchestration: Docker, Kubernetes, and Helm across AWS, GCP, and Azure at enterprise scale with hybrid cloud networking including VPN, Direct Connect, ExpressRoute, and cross-cloud connectivity. DevOps & Automation: CI/CD automation, GitOps workflows, Infrastructure-as-Code mastery, and scripting proficiency in Python, Bash, and Go for infrastructure management and toil reduction. Data Platform Operations: Experience with Trino/Presto, Apache Airflow, dbt, analytics databases, and Cloud Development Environment platforms including Coder for workspace provisioning. Site Reliability Engineering: SLI/SLO design, error budgets, chaos engineering, automated remediation, monitoring with Grafana/Prometheus/ELK stack, and performance engineering for distributed systems. Database & Integration: PostgreSQL operations, on-premises integration with legacy systems, and hybrid cloud architectures across cloud and on-prem environments. Security & Compliance Expertise: Multi-cloud security architecture with expertise in cloud-native security services, identity and access management across providers (AWS IAM, GCP IAM, Azure AD), enterprise compliance frameworks (SOC2, FedRAMP, PCI-DSS), secrets management, security scanning, and data privacy frameworks including GDPR and CCPA compliance. FinOps & Cost Management: Multi-cloud cost optimization strategies, resource rightsizing with automated scaling, cost allocation models for multi-tenant platforms, FinOps tooling with enterprise budget constraints, and cloud cost negotiation experience with procurement teams. Preferred Qualifications: Data engineering background with modern data stack technologies Service mesh experience with Istio, Linkerd, or cloud-native solutions Enterprise platform experience at Fortune 500 companies Global team leadership across multiple time zones SRE certification or formal training from Google, AWS, or similar programs Chaos engineering experience with tools like Chaos Monkey, Litmus, or Gremlin Open-source contributions to DevOps, Kubernetes, or SRE tools Multi-cloud certifications