OCTOPYD
Join a lean, fast team building at the frontier of AI, electronic design automation, and systems engineering. We are hiring a Head of Site Reliability Engineering. Investors include Khosla Ventures, Cerberus, and Clear Ventures.
The Opportunity
We need rock‑solid, low‑latency deployments—often inside customer data centers with no internet egress. As our first dedicated reliability owner, you’ll design, automate and operate these hybrid/on‑prem environments so customers experience “five nines” availability without touching the underlying plumbing.
What You’ll Do
Automate the stack
– build IaC pipelines (Terraform), GitOps workflows and zero‑downtime rollout strategies. Observe & respond
– instrument apps with Prometheus/Grafana, set SLOs/SLIs, lead incident response, perform root‑cause analysis, and harden runbooks. Secure & comply
– implement network segmentation, secrets management, RBAC and vulnerability scanning to satisfy strict semiconductor‑industry requirements. Collaborate
– pair with product engineers on performance profiling, scalability bottlenecks and customer issue triage. Continually improve
– champion best practices in testing, CI/CD, and chaos drills to push our “ship fast, ship quality” culture. Must‑Have Skills 5+ years building and operating production systems as an SRE / DevOps / Platform Engineer. Hands‑on expertise with
Kubernetes
and
Docker
in hybrid or bare‑metal setups. Strong Python for automation tooling; proficiency reading TypeScript services. Deep Linux administration knowledge (kernel tuning, networking, storage, security hardening). Observability stack experience (Prometheus, Grafana, Loki / ELK, Alertmanager). Proficiency with Terraform (or equivalent IaC) and Git‑based workflows. Excellent communication and a bias for action when facing vague, first‑of‑its‑kind problems. Experience running GPU workloads, ML inference or EDA toolchains in production. Familiarity with air‑gapped / restricted‑network deployments and data‑center operations. Exposure to security certifications (SOC 2, ISO 27001) or semiconductor customer audits. Prior work at an early‑stage startup. Our Culture (What You’ll Thrive In) Challenge status‑quo
•
Strong opinions, loosely held
•
Ship fast, ship quality
•
Proud of our craft Seniority level
Seniority level Mid-Senior level Employment type
Employment type Full-time Job function
Industries Semiconductor Manufacturing Referrals increase your chances of interviewing at OCTOPYD by 2x Inferred from the description for this job
Medical insurance Vision insurance 401(k) Get notified about new Site Engineer jobs in
San Jose, CA . Menlo Park, CA $80,000.00-$100,000.00 2 weeks ago San Jose, CA $70,000.00-$105,000.00 1 month ago San Francisco Bay Area $80,000.00-$95,000.00 4 weeks ago Campbell, CA $80,000.00-$101,500.00 1 week ago San Jose, CA $80,000.00-$101,500.00 1 week ago San Jose, CA $95,000.00-$120,000.00 5 days ago Santa Clara, CA $80,000.00-$101,500.00 6 days ago Pleasanton, CA $95,000.00-$120,000.00 5 days ago Livermore, CA $160,284.80-$194,833.59 2 weeks ago Redwood City, CA $20.00-$31.47 2 weeks ago Union City, CA $75,000.00-$80,000.00 1 week ago San Jose, CA $90,000.00-$108,000.00 2 weeks ago We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
#J-18808-Ljbffr
– build IaC pipelines (Terraform), GitOps workflows and zero‑downtime rollout strategies. Observe & respond
– instrument apps with Prometheus/Grafana, set SLOs/SLIs, lead incident response, perform root‑cause analysis, and harden runbooks. Secure & comply
– implement network segmentation, secrets management, RBAC and vulnerability scanning to satisfy strict semiconductor‑industry requirements. Collaborate
– pair with product engineers on performance profiling, scalability bottlenecks and customer issue triage. Continually improve
– champion best practices in testing, CI/CD, and chaos drills to push our “ship fast, ship quality” culture. Must‑Have Skills 5+ years building and operating production systems as an SRE / DevOps / Platform Engineer. Hands‑on expertise with
Kubernetes
and
Docker
in hybrid or bare‑metal setups. Strong Python for automation tooling; proficiency reading TypeScript services. Deep Linux administration knowledge (kernel tuning, networking, storage, security hardening). Observability stack experience (Prometheus, Grafana, Loki / ELK, Alertmanager). Proficiency with Terraform (or equivalent IaC) and Git‑based workflows. Excellent communication and a bias for action when facing vague, first‑of‑its‑kind problems. Experience running GPU workloads, ML inference or EDA toolchains in production. Familiarity with air‑gapped / restricted‑network deployments and data‑center operations. Exposure to security certifications (SOC 2, ISO 27001) or semiconductor customer audits. Prior work at an early‑stage startup. Our Culture (What You’ll Thrive In) Challenge status‑quo
•
Strong opinions, loosely held
•
Ship fast, ship quality
•
Proud of our craft Seniority level
Seniority level Mid-Senior level Employment type
Employment type Full-time Job function
Industries Semiconductor Manufacturing Referrals increase your chances of interviewing at OCTOPYD by 2x Inferred from the description for this job
Medical insurance Vision insurance 401(k) Get notified about new Site Engineer jobs in
San Jose, CA . Menlo Park, CA $80,000.00-$100,000.00 2 weeks ago San Jose, CA $70,000.00-$105,000.00 1 month ago San Francisco Bay Area $80,000.00-$95,000.00 4 weeks ago Campbell, CA $80,000.00-$101,500.00 1 week ago San Jose, CA $80,000.00-$101,500.00 1 week ago San Jose, CA $95,000.00-$120,000.00 5 days ago Santa Clara, CA $80,000.00-$101,500.00 6 days ago Pleasanton, CA $95,000.00-$120,000.00 5 days ago Livermore, CA $160,284.80-$194,833.59 2 weeks ago Redwood City, CA $20.00-$31.47 2 weeks ago Union City, CA $75,000.00-$80,000.00 1 week ago San Jose, CA $90,000.00-$108,000.00 2 weeks ago We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
#J-18808-Ljbffr