Alembic Limited

Applications Engineer (GPU-Accelerated)

Alembic Limited, San Francisco, California, United States, 94199

About Alembic

Alembic is pioneering a revolution in marketing, proving the true ROI of marketing activities. The Alembic Marketing Intelligence Platform applies sophisticated algorithms and AI models to finally solve this long-standing problem. When you join the Alembic team, you'll help build the tools that provide unprecedented visibility into how marketing drives revenue, helping a growing list of Fortune 500 companies make more confident, data-driven decisions.

About the Role

We're looking for a

Machine Learning Applications Engineer

with GPU, Python, and C++ expertise to help productionize cutting-edge causal AI models. You'll work closely with ML scientists to turn experimental research code into optimized, scalable, and well-structured software that powers Alembic's real-time analytics and inference systems.

This is a hands-on, performance-focused role where you'll operate at the intersection of applied ML, systems engineering, and high-performance computing.

Key Responsibilities Translate early-stage ML research and prototypes into reliable, testable, and performant software components Use CUDA, Triton, and Numba to optimize GPU-accelerated workloads for inference and preprocessing Contribute to core libraries and performance-critical routines using modern C++ in hybrid Python/C++ environments Develop modular, reusable infrastructure that supports deployment of ML workloads at scale Collaborate with data scientists and engineers to optimize data structures, memory usage, and execution paths Build interfaces and APIs to integrate ML components into Alembic's broader platform Implement logging, profiling, and observability tools to track performance and model behavior Must-Have Qualifications

4-7 years of software engineering experience, including substantial time in Python and C++ Hands-on experience with GPU programming, including CUDA, Triton, Numba, or related frameworks Strong familiarity with the Python data stack (Pandas, NumPy, PyArrow) and low-level performance tuning Experience writing high-performance, memory-efficient code in C++ Demonstrated ability to work cross-functionally with researchers, platform engineers, and product teams Comfort transforming research-grade ML code into maintainable, production-grade software Nice-to-Have

Experience with hybrid Python/C++ or Python/CUDA extension development (e.g., Pybind11, Cython, custom ops) Familiarity with ML serving or inference tools (e.g., TorchServe, ONNX Runtime, Triton Inference Server) Exposure to structured data modeling, causal inference, or large-scale statistical computation Background in distributed systems or parallel processing is a plus What You'll Get

A pivotal role building GPU-accelerated software at the heart of a real-world AI product Collaboration with an elite team of ML scientists, engineers, and product leaders The opportunity to shape performance-critical infrastructure powering enterprise decision-making A culture rooted in technical rigor, curiosity, and product impact