Lenovo
Director, AI Model Deployment and Optimization
Lenovo, Raleigh, North Carolina, United States
Director, AI Model Deployment and Optimization
Join to apply for the Director, AI Model Deployment and Optimization role at Lenovo.
Position Summary Lenovo is seeking a visionary technical leader to head our AI Model Deployment Optimization team. In this high-impact role, you will drive the development, optimization, and large‑scale deployment of cutting‑edge AI capabilities across Lenovo devices and platforms — from on-device inference to cloud-enabled workloads. You will be responsible for adapting, fine‑tuning, and optimizing open-source and proprietary foundation models for performance, efficiency, and user impact, ensuring they run seamlessly across a range of computing environments and hardware architectures.
Your team will sit at the intersection of AI software, hardware acceleration, and product innovation, pushing the boundaries of model compression, quantization, pruning, distillation, and hardware‑aware AI optimization. This is a unique opportunity to shape how AI reaches hundreds of millions of users globally.
Key Responsibilities
Lead and scale Lenovo’s AI model deployment and optimization strategy across devices, laptops, and cloud environments.
Adapt, fine‑tune, and optimize open‑source foundation models (e.g., OpenAI, Google Gemma) for Lenovo’s product portfolio.
Drive initiatives in model compression, quantization, pruning, and distillation to achieve maximum efficiency on constrained devices while preserving model quality.
Oversee performance evaluation, benchmarking, and iterative improvement cycles for large language models, vision models, and multimodal AI.
Collaborate closely with hardware architecture teams to align AI model efficiency with device and accelerator capabilities.
Develop hardware‑aware optimization algorithms and integrate them into model deployment pipelines.
Partner with global engineering, research, and product teams to bring optimized AI‑powered features (e.g., “Catch Me Up”) to market.
Establish and maintain reproducible workflows, automation pipelines, and release‑readiness criteria for AI models.
Represent Lenovo in AI model optimization research communities, technical working groups, and industry consortiums.
Build, mentor, and inspire a high‑performance applied AI engineering team.
Required Qualifications
Experience:
10+ years in production software development, including AI/ML engineering, with 5+ years in leadership roles. Proven track record in model deployment, optimization, and benchmarking at scale. Demonstrated ability to deliver production‑grade AI models optimized for both on‑device and cloud environments.
Optimization Techniques:
Strong expertise in quantization, pruning, distillation, graph optimization (ONNX, TensorRT), mixed precision, and hardware‑specific tuning (GPUs, TPUs, custom accelerators).
Inference Systems:
Experience with low‑latency serving, batching strategies, caching, and dynamic scaling across clusters.
Cloud Edge Deployment:
Deep knowledge of end‑to‑end deployment of ML/LLM models. Proven ability to deliver across environments — cloud (AWS/GCP/Azure), hybrid, and edge devices.
Tooling Frameworks:
Familiarity with PyTorch, TensorFlow, JAX, ONNX Runtime, TensorRT, TVM, and model compilation stacks.
Data Telemetry:
Building feedback loops from runtime telemetry to guide retraining, routing, and optimization.
Leadership and Communication:
Excellent leadership, communication, and cross‑functional collaboration skills.
Preferred Qualifications
Graduate degree (MS or PhD) in Computer Science, AI/ML, Computational Engineering, or related field.
Experience delivering AI features in consumer electronics or embedded platforms.
Familiarity with AI compilation stacks (e.g., TVM, MLX, Core ML Tools).
Track record of collaboration with research institutions and contributions to open‑source AI optimization libraries.
Security Compliance: Ensuring secure deployments, model integrity verification, and adherence to privacy regulations.
Primary contributions to an AI optimization/compression framework or toolset.
We are an Equal Opportunity Employer and do not discriminate against any employee or applicant for employment because of race, color, sex, age, religion, sexual orientation, gender identity, national origin, status as a veteran, and basis of disability or any federal, state, or local protected class.
#J-18808-Ljbffr
Position Summary Lenovo is seeking a visionary technical leader to head our AI Model Deployment Optimization team. In this high-impact role, you will drive the development, optimization, and large‑scale deployment of cutting‑edge AI capabilities across Lenovo devices and platforms — from on-device inference to cloud-enabled workloads. You will be responsible for adapting, fine‑tuning, and optimizing open-source and proprietary foundation models for performance, efficiency, and user impact, ensuring they run seamlessly across a range of computing environments and hardware architectures.
Your team will sit at the intersection of AI software, hardware acceleration, and product innovation, pushing the boundaries of model compression, quantization, pruning, distillation, and hardware‑aware AI optimization. This is a unique opportunity to shape how AI reaches hundreds of millions of users globally.
Key Responsibilities
Lead and scale Lenovo’s AI model deployment and optimization strategy across devices, laptops, and cloud environments.
Adapt, fine‑tune, and optimize open‑source foundation models (e.g., OpenAI, Google Gemma) for Lenovo’s product portfolio.
Drive initiatives in model compression, quantization, pruning, and distillation to achieve maximum efficiency on constrained devices while preserving model quality.
Oversee performance evaluation, benchmarking, and iterative improvement cycles for large language models, vision models, and multimodal AI.
Collaborate closely with hardware architecture teams to align AI model efficiency with device and accelerator capabilities.
Develop hardware‑aware optimization algorithms and integrate them into model deployment pipelines.
Partner with global engineering, research, and product teams to bring optimized AI‑powered features (e.g., “Catch Me Up”) to market.
Establish and maintain reproducible workflows, automation pipelines, and release‑readiness criteria for AI models.
Represent Lenovo in AI model optimization research communities, technical working groups, and industry consortiums.
Build, mentor, and inspire a high‑performance applied AI engineering team.
Required Qualifications
Experience:
10+ years in production software development, including AI/ML engineering, with 5+ years in leadership roles. Proven track record in model deployment, optimization, and benchmarking at scale. Demonstrated ability to deliver production‑grade AI models optimized for both on‑device and cloud environments.
Optimization Techniques:
Strong expertise in quantization, pruning, distillation, graph optimization (ONNX, TensorRT), mixed precision, and hardware‑specific tuning (GPUs, TPUs, custom accelerators).
Inference Systems:
Experience with low‑latency serving, batching strategies, caching, and dynamic scaling across clusters.
Cloud Edge Deployment:
Deep knowledge of end‑to‑end deployment of ML/LLM models. Proven ability to deliver across environments — cloud (AWS/GCP/Azure), hybrid, and edge devices.
Tooling Frameworks:
Familiarity with PyTorch, TensorFlow, JAX, ONNX Runtime, TensorRT, TVM, and model compilation stacks.
Data Telemetry:
Building feedback loops from runtime telemetry to guide retraining, routing, and optimization.
Leadership and Communication:
Excellent leadership, communication, and cross‑functional collaboration skills.
Preferred Qualifications
Graduate degree (MS or PhD) in Computer Science, AI/ML, Computational Engineering, or related field.
Experience delivering AI features in consumer electronics or embedded platforms.
Familiarity with AI compilation stacks (e.g., TVM, MLX, Core ML Tools).
Track record of collaboration with research institutions and contributions to open‑source AI optimization libraries.
Security Compliance: Ensuring secure deployments, model integrity verification, and adherence to privacy regulations.
Primary contributions to an AI optimization/compression framework or toolset.
We are an Equal Opportunity Employer and do not discriminate against any employee or applicant for employment because of race, color, sex, age, religion, sexual orientation, gender identity, national origin, status as a veteran, and basis of disability or any federal, state, or local protected class.
#J-18808-Ljbffr