Traversal
Overview
Get AI-powered advice on this job and more exclusive features. About Traversal
Traversal is the AI Site Reliability Engineer (SRE) for the enterprise—trusted by some of the largest companies to troubleshoot, remediate, and prevent complex production incidents. Our mission is to free engineers from firefighting and enable them to focus on creative, high-impact work. Our roots are in AI research, and we’re building the premier AI agent lab for the enterprise. We assemble a diverse team from academia and industry to tackle hard AI problems and deliver reliable, scalable infrastructure. The Role
As an AI Platform Engineer at Traversal, you’ll work on the core foundations that make Traversal’s AI possible, spanning both agent infrastructure and evaluation systems. You’ll contribute to both research and engineering to accelerate the AI loop: build → evaluate → improve → ship.
Agent Infrastructure — Build the frameworks, orchestration layers, and developer tooling that power Traversal's AI agents for root cause analysis, alert triage, and "chat with your infrastructure/telemetry." Design scalable distributed systems and abstractions (e.g., MCP servers, multi-agent orchestration, toolkits) that balance research flexibility with production reliability.
Evaluation — Define what "good" looks like for AI performance in the incident management domain. Build live evaluation pipelines, automated scoring systems, and benchmarks; integrate evaluation into the developer lifecycle; surface these insights to customers as a value-add.
This work combines research (agentic architectures, benchmarking, calibration, finetuning) with engineering (production-scale infra, APIs, distributed systems) to accelerate the entire AI loop: build → evaluate → improve → ship.
Responsibilities
Design and build agent frameworks, orchestration layers, and developer tooling for Traversal's AI agents.
Architect scalable distributed systems to support real-time workloads over petabytes of heterogeneous telemetry data.
Build live evaluation pipelines, automated scoring systems, and benchmarks to measure and drive AI performance.
Integrate evaluation systems into the developer lifecycle to create a fast research-to-production loop.
Surface evaluation signals and benchmarks to customers as a core product capability.
Partner with research scientists to prototype and productionize agentic architectures.
Own observability, latency, and reliability for agents in production.
Evolve and scale the agent + evaluation platform as the backbone of Traversal's AI systems.
Requirements
Strong system design skills for distributed systems.
Proven production-scale software engineering experience.
Experience with LLM-based applications and/or multi-agent systems.
Strong data modeling skills and a track record of writing clean, maintainable code.
Collaborative, impact-driven mindset and ability to work across research and engineering teams.
Nice to Have
Knowledge of software incidents and production SRE workflows.
Prior experience with AI benchmarking or evaluation systems.
Experience creating quantitative scoring systems or benchmarks in new problem domains.
Familiarity with observability stacks (logs, metrics, traces) and telemetry systems.
Background in agentic architectures, orchestration frameworks, or applied AI research.
Compensation
We offer competitive compensation, startup equity, health insurance, and additional benefits. The U.S. base salary range for this full-time, in-person role in New York is $150,000–$300,000, plus equity and benefits. Our salary ranges are based on location, level, and role. Individual compensation is determined by experience, skills, and job-related knowledge.
Why You Should Join Us
We’ll make sure you’re fully supported with health insurance, a great tech setup, flexible time off, and plenty of in-office snacks. We offer competitive salary and equity packages, and take thoughtful consideration with every hire on our small, high-impact team. Traversal is fully in-office, 5 days a week, based in New York near Madison Square Park. We have a collaborative, hard-working culture and are energized by building the future of AI-powered software maintenance.
Working here means owning meaningful parts of the product, having the flexibility to move fast, and learning constantly. This is a place to grow your career, make a real impact, and help define a new category of infrastructure software.
Seniority level Mid-Senior level
Employment type Full-time
Job function Engineering and Information Technology
Industries Software Development
Referrals increase your chances of interviewing at Traversal by 2x
New York, NY various salary ranges and postings reflect market data and are provided for context.
#J-18808-Ljbffr
Get AI-powered advice on this job and more exclusive features. About Traversal
Traversal is the AI Site Reliability Engineer (SRE) for the enterprise—trusted by some of the largest companies to troubleshoot, remediate, and prevent complex production incidents. Our mission is to free engineers from firefighting and enable them to focus on creative, high-impact work. Our roots are in AI research, and we’re building the premier AI agent lab for the enterprise. We assemble a diverse team from academia and industry to tackle hard AI problems and deliver reliable, scalable infrastructure. The Role
As an AI Platform Engineer at Traversal, you’ll work on the core foundations that make Traversal’s AI possible, spanning both agent infrastructure and evaluation systems. You’ll contribute to both research and engineering to accelerate the AI loop: build → evaluate → improve → ship.
Agent Infrastructure — Build the frameworks, orchestration layers, and developer tooling that power Traversal's AI agents for root cause analysis, alert triage, and "chat with your infrastructure/telemetry." Design scalable distributed systems and abstractions (e.g., MCP servers, multi-agent orchestration, toolkits) that balance research flexibility with production reliability.
Evaluation — Define what "good" looks like for AI performance in the incident management domain. Build live evaluation pipelines, automated scoring systems, and benchmarks; integrate evaluation into the developer lifecycle; surface these insights to customers as a value-add.
This work combines research (agentic architectures, benchmarking, calibration, finetuning) with engineering (production-scale infra, APIs, distributed systems) to accelerate the entire AI loop: build → evaluate → improve → ship.
Responsibilities
Design and build agent frameworks, orchestration layers, and developer tooling for Traversal's AI agents.
Architect scalable distributed systems to support real-time workloads over petabytes of heterogeneous telemetry data.
Build live evaluation pipelines, automated scoring systems, and benchmarks to measure and drive AI performance.
Integrate evaluation systems into the developer lifecycle to create a fast research-to-production loop.
Surface evaluation signals and benchmarks to customers as a core product capability.
Partner with research scientists to prototype and productionize agentic architectures.
Own observability, latency, and reliability for agents in production.
Evolve and scale the agent + evaluation platform as the backbone of Traversal's AI systems.
Requirements
Strong system design skills for distributed systems.
Proven production-scale software engineering experience.
Experience with LLM-based applications and/or multi-agent systems.
Strong data modeling skills and a track record of writing clean, maintainable code.
Collaborative, impact-driven mindset and ability to work across research and engineering teams.
Nice to Have
Knowledge of software incidents and production SRE workflows.
Prior experience with AI benchmarking or evaluation systems.
Experience creating quantitative scoring systems or benchmarks in new problem domains.
Familiarity with observability stacks (logs, metrics, traces) and telemetry systems.
Background in agentic architectures, orchestration frameworks, or applied AI research.
Compensation
We offer competitive compensation, startup equity, health insurance, and additional benefits. The U.S. base salary range for this full-time, in-person role in New York is $150,000–$300,000, plus equity and benefits. Our salary ranges are based on location, level, and role. Individual compensation is determined by experience, skills, and job-related knowledge.
Why You Should Join Us
We’ll make sure you’re fully supported with health insurance, a great tech setup, flexible time off, and plenty of in-office snacks. We offer competitive salary and equity packages, and take thoughtful consideration with every hire on our small, high-impact team. Traversal is fully in-office, 5 days a week, based in New York near Madison Square Park. We have a collaborative, hard-working culture and are energized by building the future of AI-powered software maintenance.
Working here means owning meaningful parts of the product, having the flexibility to move fast, and learning constantly. This is a place to grow your career, make a real impact, and help define a new category of infrastructure software.
Seniority level Mid-Senior level
Employment type Full-time
Job function Engineering and Information Technology
Industries Software Development
Referrals increase your chances of interviewing at Traversal by 2x
New York, NY various salary ranges and postings reflect market data and are provided for context.
#J-18808-Ljbffr