Whatfix

Senior Software Engineer in San Jose.

Whatfix, San Jose, California, United States, 95199

Senior Software Engineer – Reliability & Kubernetes (E5) Location: San Jose, CA (Onsite)

We are looking for an experienced Software Engineer (E5) who is passionate about building systems that are resilient, observable, and designed for scale from day one. This role sits within our Reliability Engineering charter and focuses on strengthening the core platform that powers all Whatfix products - including our next‑generation AI offerings.

You will design and implement reliability frameworks, evolve our Kubernetes‑based infrastructure, and create automation that allows engineering teams to operate their services with confidence. This is a senior individual contributor role where you will directly influence system architecture, lead reliability initiatives across teams, and mature the technical foundations required to support our enterprise and federal customers.

Candidate must be authorized to work in the United States on a full‑time basis without employer sponsorship, either now or in the future.

What You’ll Own

Architect and deliver platform components that improve reliability, fault tolerance, and system performance

Build reusable tooling and automation to reduce manual operations and scale reliability practices across engineering

Lead the design and rollout of observability and monitoring frameworks that give teams deep visibility into their services

Serve as a technical escalation point for critical incidents and drive long‑term remediation through blameless RCAs

Strengthen our Kubernetes platform with better automation, deployment workflows, and resource efficiency

Partner with engineering, platform, and product teams to define SLIs/SLOs and embed them into how we operate services

Support on‑prem and regulated environment deployments by ensuring high availability and compliance requirements are met

What You’ll Bring

Strong hands‑on programming experience in

Java

(plus

Python

or

Go

is a bonus)

Expertise running and scaling

Kubernetes

workloads in production environments

Experience with

GitOps

practices and tooling (ArgoCD, Helm)

Strong grounding in

CI/CD , infrastructure as code, and automated deployment pipelines

Background in observability (metrics, logs, traces) and designing systems that are measurable and diagnosable

Proven experience driving post‑incident reviews and converting findings into permanent engineering improvements

Ability to break down complex distributed systems problems into practical, high‑impact solutions

Nice‑to‑Have Experience

Log aggregation tools or stacks (e.g., ELK)

Chaos engineering or resilience testing approaches

Building internal developer platforms or reliability frameworks

Exposure to large‑scale or regulated enterprise environments

Who Thrives in This Role

Engineers who enjoy working across systems, infrastructure, and platform layers

ICs who like solving ambiguous problems and setting high technical standards

People who think in automation, self‑healing patterns, and long‑term system health

Engineers who want their work to directly influence the reliability posture of company‑wide products

Soft Skills That Matter

Strong ownership and problem‑solving mindset

Ability to collaborate across multiple engineering groups

Clear communication, especially during high‑pressure incident scenarios

Mentoring and uplifting other engineers through reviews, patterns, and best practices

Uncapped incentives

Equity plan

Mac shop, work with the newest technologies

Unlimited PTO policy

Paid maternity/paternity leave

Monthly cell phone stipend

Medical, Dental, and Vision coverage (Whatfix pays 80% of the premium for individuals and their families; for the HSA, Whatfix contributes $1,000 for individuals and $2,000 for a family)

Team and company outings

Learning and Development benefits

At Whatfix, we value collaboration, innovation, and human connection. We believe that working together in the office five days a week fosters open communication, strengthens our community, and drives innovation, helping us achieve our goals more effectively.

To facilitate global collaboration, our US teams start and end early, while our India teams start and end late. US teams do not have any evening meetings. Relocation and Sponsorship offered.

Whatfix is an Equal Opportunity Employer and an E‑Verify participant. All activities must comply with our Equal Opportunity Laws, ADA, and other regulations, as appropriate.

We are an equal opportunity employer and value diverse people because of and not in spite of the differences. We do not discriminate on the basis of race, religion, color, national origin, ethnicity, gender, sexual orientation, age, marital status, veteran status, or disability status.

#J-18808-Ljbffr