Acceler8 Talent
Distributed Systems Engineer
Acceler8 Talent, San Francisco, California, United States, 94199
Stop Losing Sleep Over ML/SW Hiring | Acceler8 Talent/Understanding Recruitment | Boston/London⚡️
A company building frontier-scale AI models that automate software engineering and AI research, combining ultra-long context, domain-specific RL, and massive compute infrastructure are looking for a Distributed Systems Engineer to join their team.
Base pay range $225,000.00/yr - $550,000.00/yr
What Will I Be Doing:
Design and build distributed data and coordination systems that enable ultra-long-context model training and inference
Develop high-performance storage and caching systems to support large-scale GPU workloads
Work deep in the internals of modern deep learning frameworks in highly distributed environments
Build automation for fault detection, recovery and high availability across GPU clusters
Troubleshoot complex, cross-stack issues spanning GPUs, networking, storage, operating systems and cloud infrastructure
What We’re Looking For:
Deep expertise in distributed systems design and public cloud platforms
Proven experience designing and operating highly available, high-throughput data systems
Strong knowledge of distributed databases, batch or stream processing systems, and/or distributed file systems
Exceptional problem-solving ability across the full systems stack
A hands-on mindset with the curiosity and grit to learn fast in a frontier technical environment
What’s In It for Me:
Salary of $225K–$550K dependent on experience + significant equity
Great benefits inc. 401(k) with 6% company match, comprehensive health, unlimited PTO
Visa sponsorship and SF relocation stipend available
Well-funded ($465M+) with backing from top investors
Seniority level Mid-Senior level
Employment type Full-time
Job function Information Technology
Industries IT Services and IT Consulting
Location: San Francisco, CA
Apply now for immediate consideration!
#J-18808-Ljbffr
A company building frontier-scale AI models that automate software engineering and AI research, combining ultra-long context, domain-specific RL, and massive compute infrastructure are looking for a Distributed Systems Engineer to join their team.
Base pay range $225,000.00/yr - $550,000.00/yr
What Will I Be Doing:
Design and build distributed data and coordination systems that enable ultra-long-context model training and inference
Develop high-performance storage and caching systems to support large-scale GPU workloads
Work deep in the internals of modern deep learning frameworks in highly distributed environments
Build automation for fault detection, recovery and high availability across GPU clusters
Troubleshoot complex, cross-stack issues spanning GPUs, networking, storage, operating systems and cloud infrastructure
What We’re Looking For:
Deep expertise in distributed systems design and public cloud platforms
Proven experience designing and operating highly available, high-throughput data systems
Strong knowledge of distributed databases, batch or stream processing systems, and/or distributed file systems
Exceptional problem-solving ability across the full systems stack
A hands-on mindset with the curiosity and grit to learn fast in a frontier technical environment
What’s In It for Me:
Salary of $225K–$550K dependent on experience + significant equity
Great benefits inc. 401(k) with 6% company match, comprehensive health, unlimited PTO
Visa sponsorship and SF relocation stipend available
Well-funded ($465M+) with backing from top investors
Seniority level Mid-Senior level
Employment type Full-time
Job function Information Technology
Industries IT Services and IT Consulting
Location: San Francisco, CA
Apply now for immediate consideration!
#J-18808-Ljbffr