Logo
The Rundown AI, Inc.

AI Infrastructure Engineer, Agents

The Rundown AI, Inc., San Francisco, California, United States, 94199

Save Job

As a Software Engineer on the ML Infrastructure team, you will design and build the platform for our agent sandboxing platform: the secure, high-performance code execution layer powering our agentic workflows. This system underpins critical applications and research initiatives, and is deployed across both internal and customer-managed environments. This position requires deep expertise in systems engineering: operating systems, virtualization, networking, containers, and performance optimization. Your work will directly enable agents to execute untrusted or user-submitted code safely, efficiently, and repeatedly, and with fast startup times, strong isolation guarantees, and support for snapshotting and inspection. You will:

Design and build the sandboxing platform for code execution across containerized and virtualized environments. Ensure strong isolation, security, and reproducibility of execution across user sessions and workloads. Optimize for cold-start latency, memory footprint, and resource utilization at scale. Collaborate across security, infra, and product teams to support both internal research use cases and enterprise customer deployments. Lead architecture reviews and own projects from design through deployment in fast-paced, cross-functional settings. Ideally you'd have:

3+ years of experience building high-performance systems software (e.g. OS, container runtime, VMM, networking stack). Deep understanding of Linux internals, process isolation, memory management, cgroups, namespaces, etc. Experience with containerization and virtualization technologies (e.g., Docker, Firecracker, gVisor, QEMU, Kata Containers). Proficiency in a systems programming language such as Go, Rust, or C/C++. Familiarity with networking, security hardening, sandboxing techniques, and kernel-level performance tuning. Comfort working across infrastructure layers, from kernel modules to orchestration frameworks (e.g., Kubernetes). Strong debugging skills and the ability to make performance/security tradeoffs in production systems. Nice to haves:

Familiarity with LLM agents and agent frameworks (e.g., OpenHands, Agent2Agent, MCP). Experience running secure workloads in multi-tenant or untrusted environments (e.g., FaaS, CI sandboxes, remote notebooks). Exposure to snapshotting and restore techniques (e.g., CRIU, VM snapshots, overlayfs).

#J-18808-Ljbffr