Logo
Traversal Inc.

AI Engineer - Site Reliability Researcher New York

Traversal Inc., Harvard, Illinois, United States, 60033

Save Job

AI Engineer - Site Reliability Researcher

New York About Traversal

Traversal is the AI Site Reliability Engineer (SRE) for the enterprise—trusted by some of the largest companies to troubleshoot, remediate, and prevent complex production incidents. Our mission is to free engineers from firefighting and enable them to focus on creative, high-impact work. Our roots are in AI research, and we’re building the premier AI agent lab for the enterprise with a team that includes researchers from MIT, Harvard, and Berkeley, as well as engineers from industry. The Role

As an AI Site Reliability Researcher, you’ll help ensure the scalability, reliability, and observability of our AI platform. This is a high-impact, cross-functional role where you’ll design systems and processes to keep our AI-driven infrastructure healthy and performant. You’ll contribute during a phase of rapid growth and scale, supporting deployments and developer workflows across hybrid environments (SaaS and on-prem). You’ll help establish SRE practices to enable thoughtful and reliable scale, define change management across deployment environments, build internal observability from the ground up, and shape how AI integrates with real-world production environments. You’ll be a hands-on user of Traversal, with feedback that directly shapes the product, and you’ll collaborate with infra and AI agent teams.

Responsibilities

Brains Of The Product:

Distill SRE knowledge into agentic workflows. System Design & Architecture:

Build scalable, resilient infrastructure to support AI observability agents in cloud and on-prem environments. Observability:

Build systems to monitor logs, metrics, and traces tied to deployments and developer activity. Be a power user of observability tools. Incident Management:

Define and lead on-call and incident response processes, including alerting, debugging, and postmortems. CI/CD & Deployment:

Design and scale in-house CI/CD systems to support safe, efficient rollouts across hybrid environments. Infrastructure Automation:

Own the infrastructure-as-code stack and improve automation across deployment and provisioning workflows. Requirements

Experience as an SRE, infrastructure engineer, or similar role in fast-paced environments. Exceptional debugging skills across complex, distributed systems and the ability to root-cause issues quickly across varied tech stacks. Strong systems design intuition; understands how observability tools fit into architecture and how to leverage them in incident response. Experience with observability tools (e.g., Datadog, Grafana, Prometheus, OpenTelemetry) and incident response. Deep understanding of infrastructure automation and CI/CD systems. Hands-on experience with Terraform, Kubernetes, and cloud environments (AWS or GCP). Ability to debug distributed systems and drive system-level improvements. Experience supporting hybrid cloud/on-prem deployments and complex change management. Nice to Have

Familiarity with AI infrastructure or supporting ML/LLM workloads in production. Background in developer productivity tooling or internal platform teams. Prior experience building systems that connect infra events to developer workflows. Exposure to agentic systems or AI observability platforms. Compensation

We offer competitive compensation, startup equity, health insurance, and additional benefits. The U.S. base salary range for this full-time, in-person role in New York is $150,000–$300,000, plus equity and benefits. Our salary ranges are based on location, level, and role. Individual compensation is determined by experience, skills, and job-related knowledge. Why You Should Join Us

We’ll provide health insurance, a great tech setup, flexible time off, and in-office snacks. We offer competitive salary and equity packages, and we hire thoughtfully to grow our small, high-impact team. Traversal is fully in-office, 5 days a week, based in New York near Madison Square Park. We value collaboration, hard work, and building the future of AI-powered software maintenance. Working here means owning meaningful parts of the product, moving fast, and constant learning. This is a place to grow your career, make a real impact, and help define a new category of infrastructure software. Apply for this job

To apply, please submit your information through the form above. This position is based in NYC and requires onsite work five days per week. Voluntary Self-Identification

For government reporting purposes, we ask candidates to respond to the below self-identification survey. Completion is voluntary. Information provided will be kept confidential and used as part of Equal Employment Opportunity reporting. We do not discriminate on the basis of protected status. If you belong to protected veteran categories or have a disability, you may voluntarily provide information for reporting purposes, as described in the self-identification forms. See the page for more details.

#J-18808-Ljbffr