Start2Scale

Senior Site Reliability Engineer (MLOps) | Market Leading AI Product

Start2Scale, Seattle, Washington, us, 98127

Overview

Start2Scale is a talent acquisition advisory and recruitment consultancy serving tech startups and scaleups. Our client is a Gartner Magic Quadrant Leader powering its proprietary AI-enabled discovery technology used by thousands of companies worldwide. The Opportunity Join a small, high-visibility team reporting to CxO leadership to tackle critical internal capability and organizational uplift initiatives. This role provides direct access to executive leadership and the opportunity to shape the technical direction of an AI-powered platform used by thousands of businesses globally. Your Role Senior AIOps Engineer responsible for turning research into production-ready, scalable AI services that operate with sub-second latency at massive scale. You will influence strategy, implement robust systems, and drive continuous improvement across the AI lifecycle. What You’ll Do

Observability & Reliability: Define and monitor SLIs/SLOs for model latency, throughput, accuracy, drift, and cost; integrate logging, tracing, and metrics; establish alerting and on-call practices.

Data & Feature Engineering: Build scalable data pipelines ingesting clickstream logs, metadata, images, and signals; implement real-time and offline feature extraction, validation, and lineage tracking.

Performance & Cost Optimization: Profile models/services; use hardware acceleration (GPU/TPU), libraries (ONNX, OpenVINO); implement caching; right-size clusters to balance performance and cost.

Governance & Compliance: Incorporate security, privacy, and responsible-AI checks; manage secrets and access controls; ensure auditability and reproducibility through documentation and artifact tracking.

Collaboration & Mentorship: Partner with Data Scientists, Product Owners, and SREs; coach junior engineers on MLOps and share knowledge across the org.

Productionization & Packaging: Convert notebooks into production-ready Python/Go microservices or pipelines; design reproducible build pipelines and manage artefacts in registries.

Scalable Deployment: Orchestrate real-time and batch inference on Kubernetes and cloud managed services; implement blue-green/canary rollouts, rollbacks, and model versioning strategies.

MLOps & CI/CD: Build and maintain CI/CD pipelines; automate feature store updates, retraining triggers, and scheduled batch jobs using orchestration tools.

You Might Be a Fit If You Have

5+ years in software engineering with 3+ years deploying ML/AI systems at enterprise scale

Strong coding skills in Python and at least one statically typed language (Golang preferred)

Hands-on experience with containers (Docker), Kubernetes, and cloud platforms (AWS/GCP/Azure)

Proven track record building CI/CD pipelines and automated testing for ML workloads

Deep understanding of REST/gRPC APIs, message queues, and streaming/batch processing

Technical Depth

Experience implementing monitoring, alerting, and logging for mission-critical services

Familiarity with ML lifecycle tools (MLflow, Kubeflow, SageMaker, Vertex AI, Feature Stores)

Knowledge of feature engineering, model evaluation, A/B testing, and drift detection

Understanding of performance optimization and cost management at scale

Leadership Qualities

Ability to influence technical decisions at the executive level

Experience mentoring engineers and driving best-practice adoption

Strong communication with technical and non-technical stakeholders

Track record solving high-impact, organization-wide technical challenges

Work Arrangement This is a permanent role with flexible remote work in the Seattle region. The team is remote-first with an expectation of 1 day in the office. Why This Role Is Exceptional Executive Access & Impact

Direct reporting to CxO leadership with organization-wide visibility

Influence strategic technical decisions and product direction

Access to executive meetings and the strategic layer

Collaborate with three other senior engineers on critical internal initiatives

Fast-track career progression within a market-leading company

Exposure to cutting-edge AI/ML technologies at scale

Market Leadership

Work with a Gartner Magic Quadrant Leader with proven market dominance

Impact millions of users through highly scalable AI-powered systems

Technical Excellence

Access to world-class infrastructure and technical resources

Opportunity to work with the latest MLOps tools and technologies

Solve complex problems rarely encountered by engineers

CI/CD: GitHub Actions, Terraform, automated testing frameworks

Equal Opportunity Start2Scale is committed to building an inclusive workplace and welcomes applicants regardless of race, age, religion, sex, gender identity, sexual orientation, marital status, color, veteran status, disability, or socioeconomic background.

#J-18808-Ljbffr