The Rundown AI, Inc.
As a Software Engineer on the ML Infrastructure team, you will design and build the platform for our agent sandboxing platform: the secure, high-performance code execution layer powering our agentic workflows. This system underpins critical applications and research initiatives, and is deployed across both internal and customer-managed environments.
This position requires deep expertise in systems engineering: operating systems, virtualization, networking, containers, and performance optimization. Your work will directly enable agents to execute untrusted or user-submitted code safely, efficiently, and repeatedly, and with fast startup times, strong isolation guarantees, and support for snapshotting and inspection.
You will:
- Design and build the sandboxing platform for code execution across containerized and virtualized environments.
- Ensure strong isolation, security, and reproducibility of execution across user sessions and workloads.
- Optimize for cold-start latency, memory footprint, and resource utilization at scale.
- Collaborate across security, infra, and product teams to support both internal research use cases and enterprise customer deployments.
- Lead architecture reviews and own projects from design through deployment in fast-paced, cross-functional settings.
Ideally you'd have:
- 3+ years of experience building high-performance systems software (e.g. OS, container runtime, VMM, networking stack).
- Deep understanding of Linux internals, process isolation, memory management, cgroups, namespaces, etc.
- Experience with containerization and virtualization technologies (e.g., Docker, Firecracker, gVisor, QEMU, Kata Containers).
- Proficiency in a systems programming language such as Go, Rust, or C/C++.
- Familiarity with networking, security hardening, sandboxing techniques, and kernel-level performance tuning.
- Comfort working across infrastructure layers, from kernel modules to orchestration frameworks (e.g., Kubernetes).
- Strong debugging skills and the ability to make performance/security tradeoffs in production systems.
Nice to haves:
- Familiarity with LLM agents and agent frameworks (e.g., OpenHands, Agent2Agent, MCP).
- Experience running secure workloads in multi-tenant or untrusted environments (e.g., FaaS, CI sandboxes, remote notebooks).
- Exposure to snapshotting and restore techniques (e.g., CRIU, VM snapshots, overlayfs).