Quadric
Join to apply for the
AI Inference Engineer
role at
Quadric
Get AI-powered advice on this job and more exclusive features.
Direct message the job poster from Quadric
Quadric has created an innovative general purpose neural processing unit (GPNPU) architecture. Quadric's co‑optimized software and hardware is targeted to run neural network (NN) inference workloads in a wide variety of edge and endpoint devices, ranging from battery‑operated smart‑sensor systems to high‑performance automotive or autonomous vehicle systems. Unlike other NPUs or neural network accelerators in the industry today that can only accelerate a portion of a machine learning graph, the Quadric GPNPU executes both NN graph code and conventional C++ DSP and control code.
Role The AI Inference Engineer in Quadric is the key bridge between the world of AI/LLM models and Quadric unique platforms. The AI Inference Engineer at Quadric will [1] port AI models to Quadric platform; [2] optimize the model deployment for efficient inference; [3] profile and benchmark the model performance. This senior technical role demands deep knowledge of AI model algorithms, system architecture and AI toolchains/frameworks.
Responsibilities
Quantize, prune and convert models for deployment
Port models to Quadric platform using Quadric toolchain
Benchmark and profile model performance and accuracy
Develop tools to scale and speed up the deployment
Make improvements to SDK and runtime
Provide technical support and documents to customers and developer community
Requirements
Bachelor’s or Master’s in Computer Science and/or Electrical Engineering.
5+ years of experience in AI/LLM model inference and deployment frameworks/tools
Experience with model quantization (PTQ, QAT) and tools
Experience with model accuracy measures
Experience with model inference performance profiling
Experience with at least one of the following frameworks: onnxruntime, PyTorch, vLLM, huggingface‑transformer, neural‑compressor, llamacpp
Proficiency in C/C++ and Python
Demonstrate good capability in problem solving, debugging and communication
Benefits
Life Insurance (Basic, Voluntary & AD&D)
Paid Time Off (Vacation, Sick & Public Holidays)
Family Leave (Maternity, Paternity)
Short Term & Long Term Disability
Training & Development
Work From Home
Free Food & Snacks
Stock Option Plan
Seniority Level Mid‑Senior level
Employment Type Full‑time
Job Function Computer Hardware Manufacturing and Software Development
Referrals increase your chances of interviewing at Quadric by 2x
#J-18808-Ljbffr
AI Inference Engineer
role at
Quadric
Get AI-powered advice on this job and more exclusive features.
Direct message the job poster from Quadric
Quadric has created an innovative general purpose neural processing unit (GPNPU) architecture. Quadric's co‑optimized software and hardware is targeted to run neural network (NN) inference workloads in a wide variety of edge and endpoint devices, ranging from battery‑operated smart‑sensor systems to high‑performance automotive or autonomous vehicle systems. Unlike other NPUs or neural network accelerators in the industry today that can only accelerate a portion of a machine learning graph, the Quadric GPNPU executes both NN graph code and conventional C++ DSP and control code.
Role The AI Inference Engineer in Quadric is the key bridge between the world of AI/LLM models and Quadric unique platforms. The AI Inference Engineer at Quadric will [1] port AI models to Quadric platform; [2] optimize the model deployment for efficient inference; [3] profile and benchmark the model performance. This senior technical role demands deep knowledge of AI model algorithms, system architecture and AI toolchains/frameworks.
Responsibilities
Quantize, prune and convert models for deployment
Port models to Quadric platform using Quadric toolchain
Benchmark and profile model performance and accuracy
Develop tools to scale and speed up the deployment
Make improvements to SDK and runtime
Provide technical support and documents to customers and developer community
Requirements
Bachelor’s or Master’s in Computer Science and/or Electrical Engineering.
5+ years of experience in AI/LLM model inference and deployment frameworks/tools
Experience with model quantization (PTQ, QAT) and tools
Experience with model accuracy measures
Experience with model inference performance profiling
Experience with at least one of the following frameworks: onnxruntime, PyTorch, vLLM, huggingface‑transformer, neural‑compressor, llamacpp
Proficiency in C/C++ and Python
Demonstrate good capability in problem solving, debugging and communication
Benefits
Life Insurance (Basic, Voluntary & AD&D)
Paid Time Off (Vacation, Sick & Public Holidays)
Family Leave (Maternity, Paternity)
Short Term & Long Term Disability
Training & Development
Work From Home
Free Food & Snacks
Stock Option Plan
Seniority Level Mid‑Senior level
Employment Type Full‑time
Job Function Computer Hardware Manufacturing and Software Development
Referrals increase your chances of interviewing at Quadric by 2x
#J-18808-Ljbffr