Logo
The Rundown AI, Inc.

AI Infrastructure Engineer, Agents

The Rundown AI, Inc., San Francisco

Save Job

As a Software Engineer on the ML Infrastructure team, you will design and build the platform for our agent sandboxing platform: the secure, high-performance code execution layer powering our agentic workflows. This system underpins critical applications and research initiatives, and is deployed across both internal and customer-managed environments.

This position requires deep expertise in systems engineering: operating systems, virtualization, networking, containers, and performance optimization. Your work will directly enable agents to execute untrusted or user-submitted code safely, efficiently, and repeatedly, and with fast startup times, strong isolation guarantees, and support for snapshotting and inspection.

You will:

  • Design and build the sandboxing platform for code execution across containerized and virtualized environments.
  • Ensure strong isolation, security, and reproducibility of execution across user sessions and workloads.
  • Optimize for cold-start latency, memory footprint, and resource utilization at scale.
  • Collaborate across security, infra, and product teams to support both internal research use cases and enterprise customer deployments.
  • Lead architecture reviews and own projects from design through deployment in fast-paced, cross-functional settings.

Ideally you'd have:

  • 3+ years of experience building high-performance systems software (e.g. OS, container runtime, VMM, networking stack).
  • Deep understanding of Linux internals, process isolation, memory management, cgroups, namespaces, etc.
  • Experience with containerization and virtualization technologies (e.g., Docker, Firecracker, gVisor, QEMU, Kata Containers).
  • Proficiency in a systems programming language such as Go, Rust, or C/C++.
  • Familiarity with networking, security hardening, sandboxing techniques, and kernel-level performance tuning.
  • Comfort working across infrastructure layers, from kernel modules to orchestration frameworks (e.g., Kubernetes).
  • Strong debugging skills and the ability to make performance/security tradeoffs in production systems.

Nice to haves:

  • Familiarity with LLM agents and agent frameworks (e.g., OpenHands, Agent2Agent, MCP).
  • Experience running secure workloads in multi-tenant or untrusted environments (e.g., FaaS, CI sandboxes, remote notebooks).
  • Exposure to snapshotting and restore techniques (e.g., CRIU, VM snapshots, overlayfs).
#J-18808-Ljbffr