Databricks Inc.
Staff Software Engineer - GenAI inference
Databricks Inc., San Francisco, California, United States, 94199
Staff Software Engineer - GenAI inference
P-1285
About This Role As a staff software engineer for GenAI inference, you will lead the architecture, development, and optimization of the inference engine that powers Databricks Foundation Model API. You’ll bridge research advances and production demands, ensuring high throughput, low latency, and robust scaling. Your work will encompass the full GenAI inference stack: kernels, runtimes, orchestration, memory, and integration with frameworks and orchestration systems.
What You Will Do
Own and drive the architecture, design, and implementation of the inference engine, collaborating on a model-serving stack optimized for large-scale LLMs inference.
Partner closely with researchers to bring new model architectures or features (sparsity, activation compression, mixture-of-experts) into the engine.
Lead the end‑to‑end optimization for latency, throughput, memory efficiency, and hardware utilization across GPUs and accelerators.
Define and guide standards to build and maintain instrumentation, profiling, and tracing tooling to uncover bottlenecks and guide optimizations.
Architect scalable routing, batching, scheduling, memory management, and dynamic loading mechanisms for inference workloads.
Ensure reliability, reproducibility, and fault tolerance in the inference pipelines, including A/B launches, rollback, and model versioning.
Collaborate cross‑functionally to integrate with federated, distributed inference infrastructure—orchestrating across nodes, balancing load, and handling communication overhead.
Drive cross‑team collaboration with platform engineers, cloud infrastructure, and security/compliance teams.
Represent the team externally through benchmarks, whitepapers, and open‑source contributions.
What We Look For
BS/MS/PhD in Computer Science or a related field.
Strong software engineering background (6+ years or equivalent) in performance‑critical systems.
Proven track record of owning complex system components and driving architectural decisions end‑to‑end.
Deep understanding of ML inference internals: attention, MLPs, recurrent modules, quantization, sparse operations, etc.
Hands‑on experience with CUDA, GPU programming, and key libraries (cuBLAS, cuDNN, NCCL, etc.).
Strong background in distributed systems design, including RPC frameworks, queuing, RPC batching, sharding, and memory partitioning.
Demonstrated ability to uncover and solve performance bottlenecks across layers (kernel, memory, networking, scheduler).
Experience building instrumentation, tracing, and profiling tools for ML models.
Ability to lead through influence—working closely with ML researchers to translate novel model ideas into production systems.
Excellent communication and leadership skills, with a proactive and ownership‑driven mindset.
Bonus: published research or open‑source contributions in ML systems, inference optimization, or model serving.
Pay Range Transparency
Databricks is committed to fair and equitable compensation practices. The pay range for this role is:
Local Pay Range: $190,900 – $232,800 USD
About Databricks
Databricks is the data and AI company. Over 10,000 organizations worldwide—including Comcast, Condé Nast, Grammarly, and more than 50% of the Fortune 500—relies on the Databricks Data Intelligence Platform to unify and democratize data, analytics, and AI. Databricks is headquartered in San Francisco, with offices worldwide.
Benefits
We offer comprehensive benefits and perks that meet the needs of all of our employees. For specific details on the benefits offered in your region, please visit https://www.mybenefitsnow.com/databricks.
Our Commitment to Diversity and Inclusion
We are committed to fostering a diverse and inclusive culture where everyone can excel. Hiring practices are inclusive and meet equal employment opportunity standards. Individuals seeking employment are considered without regard to age, color, disability, ethnicity, family or marital status, gender identity or expression, language, national origin, physical or mental ability, political affiliation, race, religion, sexual orientation, socio‑economic status, veteran status, and other protected characteristics.
All employment decisions are made based on skills, experience, and qualifications.
#J-18808-Ljbffr
P-1285
About This Role As a staff software engineer for GenAI inference, you will lead the architecture, development, and optimization of the inference engine that powers Databricks Foundation Model API. You’ll bridge research advances and production demands, ensuring high throughput, low latency, and robust scaling. Your work will encompass the full GenAI inference stack: kernels, runtimes, orchestration, memory, and integration with frameworks and orchestration systems.
What You Will Do
Own and drive the architecture, design, and implementation of the inference engine, collaborating on a model-serving stack optimized for large-scale LLMs inference.
Partner closely with researchers to bring new model architectures or features (sparsity, activation compression, mixture-of-experts) into the engine.
Lead the end‑to‑end optimization for latency, throughput, memory efficiency, and hardware utilization across GPUs and accelerators.
Define and guide standards to build and maintain instrumentation, profiling, and tracing tooling to uncover bottlenecks and guide optimizations.
Architect scalable routing, batching, scheduling, memory management, and dynamic loading mechanisms for inference workloads.
Ensure reliability, reproducibility, and fault tolerance in the inference pipelines, including A/B launches, rollback, and model versioning.
Collaborate cross‑functionally to integrate with federated, distributed inference infrastructure—orchestrating across nodes, balancing load, and handling communication overhead.
Drive cross‑team collaboration with platform engineers, cloud infrastructure, and security/compliance teams.
Represent the team externally through benchmarks, whitepapers, and open‑source contributions.
What We Look For
BS/MS/PhD in Computer Science or a related field.
Strong software engineering background (6+ years or equivalent) in performance‑critical systems.
Proven track record of owning complex system components and driving architectural decisions end‑to‑end.
Deep understanding of ML inference internals: attention, MLPs, recurrent modules, quantization, sparse operations, etc.
Hands‑on experience with CUDA, GPU programming, and key libraries (cuBLAS, cuDNN, NCCL, etc.).
Strong background in distributed systems design, including RPC frameworks, queuing, RPC batching, sharding, and memory partitioning.
Demonstrated ability to uncover and solve performance bottlenecks across layers (kernel, memory, networking, scheduler).
Experience building instrumentation, tracing, and profiling tools for ML models.
Ability to lead through influence—working closely with ML researchers to translate novel model ideas into production systems.
Excellent communication and leadership skills, with a proactive and ownership‑driven mindset.
Bonus: published research or open‑source contributions in ML systems, inference optimization, or model serving.
Pay Range Transparency
Databricks is committed to fair and equitable compensation practices. The pay range for this role is:
Local Pay Range: $190,900 – $232,800 USD
About Databricks
Databricks is the data and AI company. Over 10,000 organizations worldwide—including Comcast, Condé Nast, Grammarly, and more than 50% of the Fortune 500—relies on the Databricks Data Intelligence Platform to unify and democratize data, analytics, and AI. Databricks is headquartered in San Francisco, with offices worldwide.
Benefits
We offer comprehensive benefits and perks that meet the needs of all of our employees. For specific details on the benefits offered in your region, please visit https://www.mybenefitsnow.com/databricks.
Our Commitment to Diversity and Inclusion
We are committed to fostering a diverse and inclusive culture where everyone can excel. Hiring practices are inclusive and meet equal employment opportunity standards. Individuals seeking employment are considered without regard to age, color, disability, ethnicity, family or marital status, gender identity or expression, language, national origin, physical or mental ability, political affiliation, race, religion, sexual orientation, socio‑economic status, veteran status, and other protected characteristics.
All employment decisions are made based on skills, experience, and qualifications.
#J-18808-Ljbffr