Start2Scale
Senior Site Reliability Engineer (MLOps) | Market Leading AI Product
Start2Scale, Seattle, Washington, us, 98127
Overview
Start2Scale is a talent acquisition advisory and recruitment consultancy serving tech startups and scaleups. Our client is a Gartner Magic Quadrant Leader powering its proprietary AI-enabled discovery technology used by thousands of companies worldwide. The Opportunity Join a small, high-visibility team reporting to CxO leadership to tackle critical internal capability and organizational uplift initiatives. This role provides direct access to executive leadership and the opportunity to shape the technical direction of an AI-powered platform used by thousands of businesses globally. Your Role Senior AIOps Engineer responsible for turning research into production-ready, scalable AI services that operate with sub-second latency at massive scale. You will influence strategy, implement robust systems, and drive continuous improvement across the AI lifecycle. What You’ll Do
Observability & Reliability: Define and monitor SLIs/SLOs for model latency, throughput, accuracy, drift, and cost; integrate logging, tracing, and metrics; establish alerting and on-call practices.
Data & Feature Engineering: Build scalable data pipelines ingesting clickstream logs, metadata, images, and signals; implement real-time and offline feature extraction, validation, and lineage tracking.
Performance & Cost Optimization: Profile models/services; use hardware acceleration (GPU/TPU), libraries (ONNX, OpenVINO); implement caching; right-size clusters to balance performance and cost.
Governance & Compliance: Incorporate security, privacy, and responsible-AI checks; manage secrets and access controls; ensure auditability and reproducibility through documentation and artifact tracking.
Collaboration & Mentorship: Partner with Data Scientists, Product Owners, and SREs; coach junior engineers on MLOps and share knowledge across the org.
Productionization & Packaging: Convert notebooks into production-ready Python/Go microservices or pipelines; design reproducible build pipelines and manage artefacts in registries.
Scalable Deployment: Orchestrate real-time and batch inference on Kubernetes and cloud managed services; implement blue-green/canary rollouts, rollbacks, and model versioning strategies.
MLOps & CI/CD: Build and maintain CI/CD pipelines; automate feature store updates, retraining triggers, and scheduled batch jobs using orchestration tools.
You Might Be a Fit If You Have
5+ years in software engineering with 3+ years deploying ML/AI systems at enterprise scale
Strong coding skills in Python and at least one statically typed language (Golang preferred)
Hands-on experience with containers (Docker), Kubernetes, and cloud platforms (AWS/GCP/Azure)
Proven track record building CI/CD pipelines and automated testing for ML workloads
Deep understanding of REST/gRPC APIs, message queues, and streaming/batch processing
Technical Depth
Experience implementing monitoring, alerting, and logging for mission-critical services
Familiarity with ML lifecycle tools (MLflow, Kubeflow, SageMaker, Vertex AI, Feature Stores)
Knowledge of feature engineering, model evaluation, A/B testing, and drift detection
Understanding of performance optimization and cost management at scale
Leadership Qualities
Ability to influence technical decisions at the executive level
Experience mentoring engineers and driving best-practice adoption
Strong communication with technical and non-technical stakeholders
Track record solving high-impact, organization-wide technical challenges
Work Arrangement This is a permanent role with flexible remote work in the Seattle region. The team is remote-first with an expectation of 1 day in the office. Why This Role Is Exceptional Executive Access & Impact
Direct reporting to CxO leadership with organization-wide visibility
Influence strategic technical decisions and product direction
Access to executive meetings and the strategic layer
Collaborate with three other senior engineers on critical internal initiatives
Fast-track career progression within a market-leading company
Exposure to cutting-edge AI/ML technologies at scale
Market Leadership
Work with a Gartner Magic Quadrant Leader with proven market dominance
Impact millions of users through highly scalable AI-powered systems
Technical Excellence
Access to world-class infrastructure and technical resources
Opportunity to work with the latest MLOps tools and technologies
Solve complex problems rarely encountered by engineers
CI/CD: GitHub Actions, Terraform, automated testing frameworks
Equal Opportunity Start2Scale is committed to building an inclusive workplace and welcomes applicants regardless of race, age, religion, sex, gender identity, sexual orientation, marital status, color, veteran status, disability, or socioeconomic background.
#J-18808-Ljbffr
Start2Scale is a talent acquisition advisory and recruitment consultancy serving tech startups and scaleups. Our client is a Gartner Magic Quadrant Leader powering its proprietary AI-enabled discovery technology used by thousands of companies worldwide. The Opportunity Join a small, high-visibility team reporting to CxO leadership to tackle critical internal capability and organizational uplift initiatives. This role provides direct access to executive leadership and the opportunity to shape the technical direction of an AI-powered platform used by thousands of businesses globally. Your Role Senior AIOps Engineer responsible for turning research into production-ready, scalable AI services that operate with sub-second latency at massive scale. You will influence strategy, implement robust systems, and drive continuous improvement across the AI lifecycle. What You’ll Do
Observability & Reliability: Define and monitor SLIs/SLOs for model latency, throughput, accuracy, drift, and cost; integrate logging, tracing, and metrics; establish alerting and on-call practices.
Data & Feature Engineering: Build scalable data pipelines ingesting clickstream logs, metadata, images, and signals; implement real-time and offline feature extraction, validation, and lineage tracking.
Performance & Cost Optimization: Profile models/services; use hardware acceleration (GPU/TPU), libraries (ONNX, OpenVINO); implement caching; right-size clusters to balance performance and cost.
Governance & Compliance: Incorporate security, privacy, and responsible-AI checks; manage secrets and access controls; ensure auditability and reproducibility through documentation and artifact tracking.
Collaboration & Mentorship: Partner with Data Scientists, Product Owners, and SREs; coach junior engineers on MLOps and share knowledge across the org.
Productionization & Packaging: Convert notebooks into production-ready Python/Go microservices or pipelines; design reproducible build pipelines and manage artefacts in registries.
Scalable Deployment: Orchestrate real-time and batch inference on Kubernetes and cloud managed services; implement blue-green/canary rollouts, rollbacks, and model versioning strategies.
MLOps & CI/CD: Build and maintain CI/CD pipelines; automate feature store updates, retraining triggers, and scheduled batch jobs using orchestration tools.
You Might Be a Fit If You Have
5+ years in software engineering with 3+ years deploying ML/AI systems at enterprise scale
Strong coding skills in Python and at least one statically typed language (Golang preferred)
Hands-on experience with containers (Docker), Kubernetes, and cloud platforms (AWS/GCP/Azure)
Proven track record building CI/CD pipelines and automated testing for ML workloads
Deep understanding of REST/gRPC APIs, message queues, and streaming/batch processing
Technical Depth
Experience implementing monitoring, alerting, and logging for mission-critical services
Familiarity with ML lifecycle tools (MLflow, Kubeflow, SageMaker, Vertex AI, Feature Stores)
Knowledge of feature engineering, model evaluation, A/B testing, and drift detection
Understanding of performance optimization and cost management at scale
Leadership Qualities
Ability to influence technical decisions at the executive level
Experience mentoring engineers and driving best-practice adoption
Strong communication with technical and non-technical stakeholders
Track record solving high-impact, organization-wide technical challenges
Work Arrangement This is a permanent role with flexible remote work in the Seattle region. The team is remote-first with an expectation of 1 day in the office. Why This Role Is Exceptional Executive Access & Impact
Direct reporting to CxO leadership with organization-wide visibility
Influence strategic technical decisions and product direction
Access to executive meetings and the strategic layer
Collaborate with three other senior engineers on critical internal initiatives
Fast-track career progression within a market-leading company
Exposure to cutting-edge AI/ML technologies at scale
Market Leadership
Work with a Gartner Magic Quadrant Leader with proven market dominance
Impact millions of users through highly scalable AI-powered systems
Technical Excellence
Access to world-class infrastructure and technical resources
Opportunity to work with the latest MLOps tools and technologies
Solve complex problems rarely encountered by engineers
CI/CD: GitHub Actions, Terraform, automated testing frameworks
Equal Opportunity Start2Scale is committed to building an inclusive workplace and welcomes applicants regardless of race, age, religion, sex, gender identity, sexual orientation, marital status, color, veteran status, disability, or socioeconomic background.
#J-18808-Ljbffr