Logo
OCTOPYD

Head of Site Reliability Engineering (San Jose)

OCTOPYD, San Jose

Save Job

Join a lean, fast team building at the frontier of AI, electronic design automation, and systems engineering. We are hiring a Head of Site Reliability Engineering. Investors include Khosla Ventures, Cerberus, and Clear Ventures.

The Opportunity

We need rocksolid, lowlatency deploymentsoften inside customer data centers with no internet egress. As our first dedicated reliability owner, youll design, automate and operate these hybrid/onprem environments so customers experience five nines availability without touching the underlying plumbing.

What Youll Do

  • Automate the stack build IaC pipelines (Terraform), GitOps workflows and zerodowntime rollout strategies.
  • Observe & respond instrument apps with Prometheus/Grafana, set SLOs/SLIs, lead incident response, perform rootcause analysis, and harden runbooks.
  • Secure & comply implement network segmentation, secrets management, RBAC and vulnerability scanning to satisfy strict semiconductorindustry requirements.
  • Collaborate pair with product engineers on performance profiling, scalability bottlenecks and customer issue triage.
  • Continually improve champion best practices in testing, CI/CD, and chaos drills to push our ship fast, ship quality culture.

MustHave Skills

  • 5+ years building and operating production systems as an SRE / DevOps / Platform Engineer.
  • Handson expertise with Kubernetes and Docker in hybrid or baremetal setups.
  • Strong Python for automation tooling; proficiency reading TypeScript services.
  • Deep Linux administration knowledge (kernel tuning, networking, storage, security hardening).
  • Observability stack experience (Prometheus, Grafana, Loki / ELK, Alertmanager).
  • Proficiency with Terraform (or equivalent IaC) and Gitbased workflows.
  • Excellent communication and a bias for action when facing vague, firstofitskind problems.
  • Experience running GPU workloads, ML inference or EDA toolchains in production.
  • Familiarity with airgapped / restrictednetwork deployments and datacenter operations.
  • Exposure to security certifications (SOC 2, ISO 27001) or semiconductor customer audits.
  • Prior work at an earlystage startup.

Our Culture (What Youll Thrive In)

  • Challenge statusquo Strong opinions, loosely held Ship fast, ship quality Proud of our craft

Seniority level

  • Seniority level

    Mid-Senior level

Employment type

  • Employment type

    Full-time

Job function

  • Industries

    Semiconductor Manufacturing

Referrals increase your chances of interviewing at OCTOPYD by 2x

Inferred from the description for this job

Medical insurance

Vision insurance

401(k)

Get notified about new Site Engineer jobs in San Jose, CA .

Menlo Park, CA $80,000.00-$100,000.00 2 weeks ago

San Jose, CA $70,000.00-$105,000.00 1 month ago

San Francisco Bay Area $80,000.00-$95,000.00 4 weeks ago

Campbell, CA $80,000.00-$101,500.00 1 week ago

San Jose, CA $80,000.00-$101,500.00 1 week ago

San Jose, CA $95,000.00-$120,000.00 5 days ago

Santa Clara, CA $80,000.00-$101,500.00 6 days ago

Pleasanton, CA $95,000.00-$120,000.00 5 days ago

Livermore, CA $160,284.80-$194,833.59 2 weeks ago

Redwood City, CA $20.00-$31.47 2 weeks ago

Union City, CA $75,000.00-$80,000.00 1 week ago

San Jose, CA $90,000.00-$108,000.00 2 weeks ago

Were unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

#J-18808-Ljbffr