Site Reliability Engineer - (Linux & Python/Go)
Elliot Partnership - New York
Work at Elliot Partnership
Overview
- View job
Overview
Site Reliability Engineer - (Linux & Python/Go)
New York, NY (Hybrid, 3 days in office)
Highly competitive compensation package
Join an elite technology and research group at the forefront of global finance, where world-class engineering and quantitative research converge to solve some of the most complex problems in any industry. Their teams are composed of passionate problem-solvers who operate in a dynamic, large-scale IT environment. We are seeking a visionary engineer to lead critical reliability and automation initiatives, ensuring the firm's complex trading and research platforms operate with maximum performance, scalability, and resilience.
The Role:
We are seeking a deeply experienced Site Reliability Engineer to act as a Tech Lead for key infrastructure initiatives. This is a crucial, hands-on role for a hybrid systems and software engineer who thrives on solving complex problems at scale. You will be a key technical leader responsible for architecting and building the robust, automated systems that underpin the firm's critical operations. You will act as a force multiplier for the engineering organization by leading high-impact projects, mentoring other engineers, and setting the standard for technical excellence in reliability and performance.
Responsibilities:
- Lead the design and execution of high-impact projects focused on improving the reliability, scalability, and performance of their core infrastructure.
- Architect, build, and maintain mission-critical tools and automation in Python or Go to eliminate operational toil and enhance system capabilities.
- Serve as a senior escalation point for complex Linux systems issues, diagnosing and resolving deep technical challenges related to performance, configuration, and stability.
- Drive the architecture for scalable, resilient, and performant infrastructure, making key design decisions for production environments.
- Mentor and guide other engineers, championing best practices in software development, infrastructure management and site reliability.
- 7+ years of experience in a senior site reliability, infrastructure, or software engineering role with a track record of success in complex, large-scale environments.
- Expert-level proficiency in Python or Go, with a proven track record of engineering libraries, tools, or applications (not just scripting).
- Deep, hands-on expertise with the Linux operating system, including performance tuning, troubleshooting, and systems administration in a large-scale environment.
- Demonstrated experience leading technical projects, driving architectural decisions, and mentoring other engineers.
- Strong knowledge of CI/CD, infrastructure-as-code (Ansible, Terraform), and containerization (Docker, Kubernetes).
- Exceptional communication skills, with the ability to articulate complex technical concepts to a variety of audiences.