On-Device AI Runtime Engineer
Mindlance - Cupertino, California, United States, 95014
Work at Mindlance
Overview
- View job
Overview
You will be expected to: -Design and implement robust Core ML model optimization pipelines for deploying large-scale ML models on resource-constrained devices -Support product engineering teams by consulting on AI model performance, iterating on inference solutions to solve real-world mobile/edge AI problems, and developing/delivering custom on-device AI frameworks -Interface with hardware and platform teams to ensure optimal utilization of neural processing units (NPUs), GPUs, and specialized AI accelerators across the device ecosystem
Minimum Qualifications: Strong proficiency in Swift/Objective-C and Metal Performance Shaders Familiar with various ML model formats such as Core ML, ONNX, TensorFlow Lite, and PyTorch Mobile Strong critical thinking, performance optimization, and low-level system design skills Experience with model quantization, pruning, and hardware-aware neural architecture optimization
Preferred Qualifications: Experience with real-time inference pipelines and latency-critical AI applications Understanding of mobile device thermal management, power consumption patterns, and compute resource allocation for AI workloads