Join a lean, fast team building at the frontier of AI, electronic design automation, and systems engineering. We are hiring a Head of Site Reliability Engineering. Investors include Khosla Ventures, Cerberus, and Clear Ventures.
The Opportunity
We need rocksolid, lowlatency deploymentsoften inside customer data centers with no internet egress. As our first dedicated reliability owner, youll design, automate and operate these hybrid/onprem environments so customers experience five nines availability without touching the underlying plumbing.
What Youll Do
- Automate the stack build IaC pipelines (Terraform), GitOps workflows and zerodowntime rollout strategies.
- Observe & respond instrument apps with Prometheus/Grafana, set SLOs/SLIs, lead incident response, perform rootcause analysis, and harden runbooks.
- Secure & comply implement network segmentation, secrets management, RBAC and vulnerability scanning to satisfy strict semiconductorindustry requirements.
- Collaborate pair with product engineers on performance profiling, scalability bottlenecks and customer issue triage.
- Continually improve champion best practices in testing, CI/CD, and chaos drills to push our ship fast, ship quality culture.
MustHave Skills
- 5+ years building and operating production systems as an SRE / DevOps / Platform Engineer.
- Handson expertise with Kubernetes and Docker in hybrid or baremetal setups.
- Strong Python for automation tooling; proficiency reading TypeScript services.
- Deep Linux administration knowledge (kernel tuning, networking, storage, security hardening).
- Observability stack experience (Prometheus, Grafana, Loki / ELK, Alertmanager).
- Proficiency with Terraform (or equivalent IaC) and Gitbased workflows.
- Excellent communication and a bias for action when facing vague, firstofitskind problems.
- Experience running GPU workloads, ML inference or EDA toolchains in production.
- Familiarity with airgapped / restrictednetwork deployments and datacenter operations.
- Exposure to security certifications (SOC 2, ISO 27001) or semiconductor customer audits.
- Prior work at an earlystage startup.
Our Culture (What Youll Thrive In)
- Challenge statusquo Strong opinions, loosely held Ship fast, ship quality Proud of our craft
Seniority level
Seniority level
Mid-Senior level
Employment type
Employment type
Full-time
Job function
Industries
Semiconductor Manufacturing
Referrals increase your chances of interviewing at OCTOPYD by 2x
Inferred from the description for this job
Medical insurance
Vision insurance
401(k)
Get notified about new Site Engineer jobs in San Jose, CA .
Menlo Park, CA $80,000.00-$100,000.00 2 weeks ago
San Jose, CA $70,000.00-$105,000.00 1 month ago
San Francisco Bay Area $80,000.00-$95,000.00 4 weeks ago
Campbell, CA $80,000.00-$101,500.00 1 week ago
San Jose, CA $80,000.00-$101,500.00 1 week ago
San Jose, CA $95,000.00-$120,000.00 5 days ago
Santa Clara, CA $80,000.00-$101,500.00 6 days ago
Pleasanton, CA $95,000.00-$120,000.00 5 days ago
Livermore, CA $160,284.80-$194,833.59 2 weeks ago
Redwood City, CA $20.00-$31.47 2 weeks ago
Union City, CA $75,000.00-$80,000.00 1 week ago
San Jose, CA $90,000.00-$108,000.00 2 weeks ago
Were unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
#J-18808-Ljbffr