Logo
Positron

Sr Software Engineer

Positron, Liberty Lake, Washington, United States, 99019

Save Job

Positron.ai specializes in developing custom hardware systems to accelerate AI inference. These inference systems offer significant performance and efficiency gains over traditional GPU-based systems, delivering advantages in both performance per dollar and performance per watt. Positron exists to create the world's best AI inference systems. Senior Software Engineer Machine Learning Systems & High-Performance LLM Inference We are seeking a

Senior Software Engineer

to contribute to the development of high-performance software that powers execution of

open-source large language models (LLMs) on our custom appliance

. This appliance leverages a combination of

FPGAs and x86 CPUs

to accelerate

transformer-based models

. The software stack is written primarily in

modern C++ (C++17/20)

and heavily relies on

templates, SIMD optimizations, and efficient parallel computing techniques

. Key Areas of Focus & Responsibilities

Design and implement

high-performance inference software

for LLMs on custom hardware. Develop and optimize

C++-based libraries

that efficiently utilize

SIMD instructions, threading, and memory hierarchy

. Work closely with FPGA and systems engineers to ensure efficient data movement and computational offloading between x86 CPUs and FPGAs. Optimize model execution via

low-level optimizations

, including vectorization, cache efficiency, and hardware-aware scheduling. Contribute to

performance profiling tools and methodologies

to analyze execution bottlenecks at the instruction and data flow levels. Apply

NUMA-aware memory management techniques

to optimize memory access patterns for large-scale inference workloads. Implement

ML system-level optimizations

such as token streaming, KV cache optimizations, and efficient batching for transformer execution. Collaborate with ML researchers and software engineers to integrate

model quantization techniques, sparsity optimizations, and mixed-precision execution

. Ensure all code contributions include

unit, performance, acceptance, and regression tests

as part of a

continuous integration-based development process

. Required Skills & Experience

7+ years

of professional experience in

C++

software development, with a focus on

performance-critical applications

. Strong understanding of

C++ templates and modern memory management

. Hands-on experience with

SIMD programming

(AVX-512, SSE, or equivalent) and

intrinsics-based vectorization

. Experience in

high-performance computing (HPC), numerical computing, or ML inference optimization

. Experience with

ML model execution optimizations

, including efficient

tensor computations and memory access patterns

. Knowledge of

multi-threading, NUMA architectures, and low-level CPU optimization

. Proficiency with

systems-level software development

, profiling tools (perfetto, VTune, Valgrind), and benchmarking. Experience working with

hardware accelerators (FPGAs, GPUs, or custom ASICs)

and designing

efficient software-hardware interfaces

. Preferred Skills (Nice to Have)

Familiarity with

LLVM/Clang or GCC compiler optimizations

. Experience in

LLM quantization, sparsity optimizations, and mixed-precision computation

. Knowledge of

distributed inference techniques

and

networking optimizations

. Understanding of

graph partitioning and execution scheduling

for large-scale ML models. Why Join Us?

Work on a

cutting-edge ML inference platform

that redefines

performance and efficiency

for LLMs. Tackle

challenging low-level performance engineering problems

in AI and HPC. Collaborate with a team of

hardware, software, and ML experts

building an industry-first product. Opportunity to contribute to and shape the future of

open-source AI inference software

.

#J-18808-Ljbffr