Lightning AI
AI Performance Optimization Engineer
Lightning AI, San Francisco, California, United States, 94199
Overview
Join to apply for the
AI Performance Optimization Engineer
role at
Lightning AI . Lightning AI is the company reimagining the way AI is built. After creating and releasing PyTorch Lightning in 2019, Lightning AI was launched to reshape the development of artificial intelligence products for commercial and academic use. We are on a mission to simplify AI development, making it accessible to everyone—from solo researchers to large enterprises. Our platform is built to scale with the latest AI advancements while staying intuitive and adaptable, so you can bring your ideas to life. We have offices in New York City, San Francisco, and London and are backed by investors such as Coatue, Index Ventures, Bain Capital Ventures, and Firstminute.
Our values: Move Fast, Focus, Balance, Craftsmanship, Minimal.
What We’re Looking For
We are seeking a highly skilled AI Optimization Engineer to work on optimizing training and inference workloads on compute accelerators and clusters, through the
Lightning Thunder compiler
and the broader
PyTorch Lightning ecosystem . This role sits at the intersection of
deep learning research, compiler development, and large-scale system optimization . You will shape technology that pushes the boundaries of model performance and efficiency, creating foundational software that will impact the entire machine learning ecosystem.
This is a hybrid role based in either our New York City or San Francisco office with in-office requirements of 2 days per week. The salary range for this role is $120,000-$240,000.
What you'll do Develop performance-oriented model optimizations
at multiple levels:
Graph-level (operator fusion, kernel scheduling, memory planning)
Kernel-level (CUDA, Triton, custom operators for specialized hardware)
System-level (distributed training across GPUs/TPUs, inference serving at scale)
Advance the Thunder compiler
by building optimization passes, graph transformations, and integration hooks to accelerate training and inference workloads.
Work across the software stack
to ensure optimizations are accessible to end users through clean APIs, automated tooling, and seamless integration with PyTorch Lightning. Design and implement profiling and debugging tools to analyze model execution, identify bottlenecks, and guide optimization strategies.
Collaborate with hardware vendors and ecosystem partners
to ensure Thunder runs efficiently across diverse backends (NVIDIA, AMD, TPU, specialized accelerators).
Contribute to open-source projects
by developing new features, improving documentation, and supporting community adoption.
Engage with researchers and engineers
in the community, providing guidance on performance tuning and advocating for Thunder as the go-to optimization layer in ML workflows.
Work cross-functionally
with Lightning's product and engineering teams to ensure compiler and optimization improvements align with the broader product vision.
What you'll need Strong expertise with deep learning frameworks such as PyTorch, JAX, or TensorFlow.
Hands-on experience with model optimization techniques, including graph-level optimizations, quantization, pruning, mixed precision, or memory-efficient training. Deep understanding of compiler internals (IR design, operator fusion, scheduling, optimization passes) or proven work in performance-critical software.
Experience with CUDA, Triton, or other GPU programming models for developing custom kernels.
Knowledge of distributed systems and parallelism strategies (data/model/pipeline parallelism, checkpointing, elastic scaling).
Familiarity with software engineering practices: designing APIs, building robust tooling, testing, CI/CD for performance-sensitive systems.
Proven track record contributing to open-source projects in ML, HPC, or compiler domains.
Excellent collaboration and communication skills, with the ability to partner across research, engineering, and external contributors.
Bachelor's degree in Computer Science, Engineering, or a related field. Advanced degree (Master's or PhD) in machine learning, compilers, or systems highly preferred.
Benefits and Perks Competitive base salaries and stock options with a 25% one year cliff and monthly vesting thereafter.
For our international employees, Velocity Global payroll and equitable benefits across the globe.
In the US, we offer: Medical, dental and vision; Life and AD&D insurance; Flexible paid time off plus 1 week of winter closure; Generous paid family leave benefits; $500 monthly meal reimbursement; $500 one-time home office stipend; $1,000 annual learning & development stipend; 100% Citibike membership (NYC only); $45/month gym membership; additional medical and mental health services.
Inclusion note: Lightning AI is committed to fostering an inclusive and diverse workplace. We provide equal employment opportunities to all employees and applicants without regard to race, color, religion, gender, sexual orientation, gender identity, national origin, age, disability, veteran status, or any other protected characteristic.
#J-18808-Ljbffr
Join to apply for the
AI Performance Optimization Engineer
role at
Lightning AI . Lightning AI is the company reimagining the way AI is built. After creating and releasing PyTorch Lightning in 2019, Lightning AI was launched to reshape the development of artificial intelligence products for commercial and academic use. We are on a mission to simplify AI development, making it accessible to everyone—from solo researchers to large enterprises. Our platform is built to scale with the latest AI advancements while staying intuitive and adaptable, so you can bring your ideas to life. We have offices in New York City, San Francisco, and London and are backed by investors such as Coatue, Index Ventures, Bain Capital Ventures, and Firstminute.
Our values: Move Fast, Focus, Balance, Craftsmanship, Minimal.
What We’re Looking For
We are seeking a highly skilled AI Optimization Engineer to work on optimizing training and inference workloads on compute accelerators and clusters, through the
Lightning Thunder compiler
and the broader
PyTorch Lightning ecosystem . This role sits at the intersection of
deep learning research, compiler development, and large-scale system optimization . You will shape technology that pushes the boundaries of model performance and efficiency, creating foundational software that will impact the entire machine learning ecosystem.
This is a hybrid role based in either our New York City or San Francisco office with in-office requirements of 2 days per week. The salary range for this role is $120,000-$240,000.
What you'll do Develop performance-oriented model optimizations
at multiple levels:
Graph-level (operator fusion, kernel scheduling, memory planning)
Kernel-level (CUDA, Triton, custom operators for specialized hardware)
System-level (distributed training across GPUs/TPUs, inference serving at scale)
Advance the Thunder compiler
by building optimization passes, graph transformations, and integration hooks to accelerate training and inference workloads.
Work across the software stack
to ensure optimizations are accessible to end users through clean APIs, automated tooling, and seamless integration with PyTorch Lightning. Design and implement profiling and debugging tools to analyze model execution, identify bottlenecks, and guide optimization strategies.
Collaborate with hardware vendors and ecosystem partners
to ensure Thunder runs efficiently across diverse backends (NVIDIA, AMD, TPU, specialized accelerators).
Contribute to open-source projects
by developing new features, improving documentation, and supporting community adoption.
Engage with researchers and engineers
in the community, providing guidance on performance tuning and advocating for Thunder as the go-to optimization layer in ML workflows.
Work cross-functionally
with Lightning's product and engineering teams to ensure compiler and optimization improvements align with the broader product vision.
What you'll need Strong expertise with deep learning frameworks such as PyTorch, JAX, or TensorFlow.
Hands-on experience with model optimization techniques, including graph-level optimizations, quantization, pruning, mixed precision, or memory-efficient training. Deep understanding of compiler internals (IR design, operator fusion, scheduling, optimization passes) or proven work in performance-critical software.
Experience with CUDA, Triton, or other GPU programming models for developing custom kernels.
Knowledge of distributed systems and parallelism strategies (data/model/pipeline parallelism, checkpointing, elastic scaling).
Familiarity with software engineering practices: designing APIs, building robust tooling, testing, CI/CD for performance-sensitive systems.
Proven track record contributing to open-source projects in ML, HPC, or compiler domains.
Excellent collaboration and communication skills, with the ability to partner across research, engineering, and external contributors.
Bachelor's degree in Computer Science, Engineering, or a related field. Advanced degree (Master's or PhD) in machine learning, compilers, or systems highly preferred.
Benefits and Perks Competitive base salaries and stock options with a 25% one year cliff and monthly vesting thereafter.
For our international employees, Velocity Global payroll and equitable benefits across the globe.
In the US, we offer: Medical, dental and vision; Life and AD&D insurance; Flexible paid time off plus 1 week of winter closure; Generous paid family leave benefits; $500 monthly meal reimbursement; $500 one-time home office stipend; $1,000 annual learning & development stipend; 100% Citibike membership (NYC only); $45/month gym membership; additional medical and mental health services.
Inclusion note: Lightning AI is committed to fostering an inclusive and diverse workplace. We provide equal employment opportunities to all employees and applicants without regard to race, color, religion, gender, sexual orientation, gender identity, national origin, age, disability, veteran status, or any other protected characteristic.
#J-18808-Ljbffr