Elliot Partnership

Lead Platform Engineer (Private Cloud / Bare Metal)

Elliot Partnership, New York, New York, us, 10261

Lead Platform Engineer (Private Cloud / Bare Metal)

New York, NY (Hybrid, 3 days in office)

Highly competitive compensation package

Join an elite technology and research group at the forefront of global finance, where world-class engineering and quantitative research converge to solve some of the most complex problems in any industry. Their teams are composed of passionate problem-solvers who operate in a dynamic, large‑scale IT environment. We are seeking a visionary engineer to lead the architectural evolution of the firm’s massive, on‑premise private cloud, ensuring their complex trading and research platforms operate with maximum performance, scalability, and resilience. The Role

We are seeking a deeply experienced Systems Engineer to act as a Tech Lead for key infrastructure initiatives. This is a crucial, hands‑on role for a hybrid systems and software engineer who thrives on solving complex distributed systems problems at scale. You will be the key technical leader responsible for architecting and building the robust, automated platforms that underpin the firm’s critical operations. Your primary mandate will be to lead the transition of the firm’s dev teams from direct‑access bare metal to a secure, managed, and automated container platform. You will act as a force multiplier for the engineering organization by leading high‑impact projects, mentoring other engineers, and setting the standard for technical excellence in reliability and performance. Responsibilities

Architect the Private Cloud: Lead the design and execution of high‑impact projects for a distributed fleet of 10,000+ compute servers. You will drive decisions on hardware specifications, OS provisioning, and file system tuning to maximize performance on bare metal.

Build the Future Platform: Lead the greenfield design and implementation of a Kubernetes‑based container platform on bare metal. You will replace manual workflows with a structured, declarative system that empowers researchers while ensuring stability.

Eliminate Operational Toil: Architect, build, and maintain mission‑critical tools and automation in Python or Go. You will move beyond scripting to build resilient APIs, CLI tools, and automation frameworks that eliminate manual operational work at its source.

Solve Deep Technical Challenges: Serve as a senior escalation point for complex Linux systems issues, diagnosing and resolving deep technical challenges related to kernel‑level performance, hardware/OS compatibility, and reliable configuration distribution across multiple data centers.

Define Observability Strategy: Drive the architecture for a modern observability data pipeline—deciding what to store, where to store it, and how to use it for automated remediation to ensure production environments remain performant.

Technical Leadership: Mentor and guide other engineers, championing best practices in software development, infrastructure management, and site reliability engineering.

What you’ll bring

7+ years of experience in a senior site reliability, infrastructure, or software engineering role with a track record of success in complex, large‑scale environments.

Deep, hands‑on expertise with the Linux operating system. You can explain system calls, file descriptors, memory management, and Disk I/O paths at a granular level to debug performance issues.

Expert‑level proficiency in Python or Go, with a proven track record of engineering libraries, tools, or API services (not just scripting).

Experience designing and building Kubernetes clusters on bare metal (not just using EKS/GKE). You understand the deep architectural trade‑offs of CNI networking, CSI storage, and control plane design.

Demonstrated experience leading technical projects, driving architectural decisions, and mentoring other engineers through complex migrations.

#J-18808-Ljbffr