Lenovo

AI Systems Engineer

Lenovo, Raleigh, North Carolina, United States

Responsibilities

End-to-end performance analysis: Analyze performance of LLM and agentic workloads across the full stack: models, runtimes, compilers, kernels, memory, interconnect, and distributed deployment.

Model- and context-aware tuning: Characterize and optimize performance for models of varying size and context length, including tradeoffs around batch size, KV/cache management, quantization, and latency vs. throughput.

Memory microarchitectural analysis: Profile memory usage and access patterns across CPU, GPU, and accelerators; identify bottlenecks related to cache behavior, memory bandwidth, and compute utilization; propose and validate optimizations.

Networking distributed systems: Study and improve performance in heterogeneous distributed systems (multi-node, multi-accelerator), considering different networking conditions (latency, bandwidth, congestion); tune sharding, pipelining, and routing strategies.

Benchmarking methodology: Design, implement, and maintain benchmarks and load tests for LLM and agentic workloads under realistic traffic patterns and SLAs.

Optimization experimentation: Collaborate with ML, platform, and infrastructure teams to prototype and roll out optimizations (e.g., kernel-level improvements, scheduling changes, batching policies, caching strategies).

Observability capacity planning: Build and refine dashboards, alerts, and reports that surface key performance and efficiency metrics; provide data-driven guidance for capacity planning and hardware selection.

Cross-functional collaboration: Work closely with model, runtime, and platform teams to translate performance findings into architectural improvements and product-impacting changes.

Qualifications

2+ years of industry experience in systems performance engineering, ML infrastructure, HPC, or related fields.

Master’s degree or PhD in Computer Science, Computer Engineering, Electrical Engineering, or related technical field.

Strong understanding of computer architecture: CPU/GPU pipelines, caches, memory hierarchies, vector/SIMD, and accelerators.

Experience profiling and optimizing performance of complex systems using tools such as perf, VTune, Nsight, rocprof, or similar.

Strong coding skills in C++ and/or Python.

Experience working with Linux-based systems, shell scripting, and standard tooling.

Familiarity with containerized environments and orchestration (e.g., Docker, Kubernetes).

Experience working with ML workloads (preferably deep learning) in frameworks like PyTorch, TensorFlow, or JAX.

Conceptual understanding of LLM inference, including batching, token generation, and context window behavior.

Understanding of distributed systems concepts (RPC, load balancing, fault tolerance) and basic networking fundamentals (latency, bandwidth, throughput).

Strong data analysis skills; comfortable working with logs, traces, and metrics.

Ability to clearly communicate findings and tradeoffs to both engineering and non-engineering stakeholders.

Hands‑on experience optimizing LLM inference or other large‑scale deep learning workloads on GPUs or specialized accelerators.

Experience with heterogeneous systems (e.g., mixtures of CPU, GPU, NPU/ASIC) and cluster‑scale deployment.

Familiarity with LLM‑specific optimization techniques (KV cache strategies, quantization, tensor/sequence parallelism, speculative decoding, etc.).

Experience with large‑scale observability stacks (Prometheus, Grafana, OpenTelemetry) for performance monitoring.

Prior work on high‑performance computing (HPC), networking‑intensive systems, or real‑time/low‑latency services.

We are an Equal Opportunity Employer and do not discriminate against any employee or applicant for employment because of race, color, sex, age, religion, sexual orientation, gender identity, national origin, status as a veteran, and basis of disability or any federal, state, or local protected class.

Referrals increase your chances of interviewing at Lenovo by 2x

Charlotte, NC $90,000.00-$100,000.00 1 day ago

#J-18808-Ljbffr