Logo
DigitalOcean

Senior Engineer II, Inference Service

DigitalOcean, Denver, Colorado, United States, 80285

Save Job

Senior Engineer II, Inference Service

role at

DigitalOcean We are seeking an experienced Software Engineer to drive the design, optimization and scaling of our inference systems. In this role, you will build systems for optimized inference serving of well known open source / open weights models, develop novel techniques for serving custom models and scale the platform to handle millions of users across the globe. As a Senior Software Engineer at DigitalOcean, you will join a dynamic team dedicated to revolutionizing cloud computing and AI. What You’ll Do

Design and implement distributed inference platform for serving large language models Optimize runtime and infrastructure layers of the inference stack for best model performance Build native cross platform inference support across NVIDIA and AMD GPUs for a variety of model architectures Contribute to open source inference engines to make them perform better on DigitalOcean cloud Build tooling and observability to monitor system health, and build auto tuning capabilities Build benchmarking frameworks to test model serving performance to guide system and infrastructure tuning efforts Mentor engineers on inference systems, GPU infrastructure and distributed inference best practices Required Qualifications

Bachelor’s or Master’s in Computer Science, Electrical Engineering, or related field Experience as a Tech Lead Experience building distributed systems with Kubernetes, gRPC, Golang, Python Experience with GPU programming with CUDA, ROCm Experience with L3-L7 network protocols, block storage, object storage Experience with building multi-tenant systems - identity management, tenant isolation etc Proven track record defining and achieving performance KPIs (latency, throughput, cost) Preferred Qualifications

Experience with one or more inference engines: vLLM, SGLang, Modular etc Familiarity with distributed inference serving frameworks - llm-d, NVIDIA dynamo, Ray Serve etc Knowledge of distributed inference optimization techniques - tensor/data parallelism, KV cache optimizations, smart routing etc Familiarity with common LLM architectures and inference optimization techniques (e.g. continuous batching, quantization) Experience with GPU Interconnect technologies like NVLink, XGMI, RoCE Open source contributions to inference libraries, frameworks, model kernels etc Why You’ll Like Working For DigitalOcean

We innovate with purpose. You’ll be part of a cutting-edge technology company with an upward trajectory, who are proud to simplify cloud and AI so builders can spend more time creating software that changes the world. You will be a Shark who thinks big, bold, and scrappy, like an owner with a bias for action and a strong sense of responsibility for customers, products, employees, and decisions. We prioritize career development. You’ll do the best work of your career and grow with support from our organizational development resources and access to training. We care about your well-being. We offer a competitive benefits portfolio, with details varying by location. We reward our employees. Salary range: $140,000 - $175,000, with potential bonuses and equity compensation; eligibility for Employee Stock Purchase Program. We value diversity and inclusion. We are an equal-opportunity employer and do not discriminate based on listed characteristics. This is a remote role. Details

Seniority level: Mid-Senior level Employment type: Full-time Job function: Supply Chain, Information Technology, and Engineering Industries: Internet Publishing

#J-18808-Ljbffr