Quantiphi

Senior Platform Engineer

Quantiphi, Trenton, New Jersey, United States

Overview

Senior Platform Engineer role at Quantiphi About Quantiphi

Quantiphi is an award-winning Applied AI and Big Data software and services company, driven by a deep desire to solve transformational problems at the heart of businesses. Our signature approach combines groundbreaking machine-learning research with disciplined cloud and data-engineering practices to create breakthrough impact at unprecedented speed. Quantiphi has seen 2.5x growth YoY since its inception in 2013. Headquartered in Boston, with 4,000+ professionals across the globe, Quantiphi leverages Applied AI technologies across multiple industry verticals (Telco, BFSI, HCLS, etc.) and is an established Elite/Premier Partner of NVIDIA, Google Cloud, AWS, Snowflake, and others. We’ve been recognized with: 17x Google Cloud Partner of the Year awards in the last 8 years. 3x AWS AI/ML award wins. 3x NVIDIA Partner of the Year titles. 2x Snowflake Partner of the Year awards. Top analyst recognitions from Gartner, ISG, and Everest Group. Solutions across Healthcare, Financial Services, Consumer Goods, Manufacturing, and more, powered by Generative AI and Agentic AI accelerators. Great Place to Work certified for 2021, 2022, 2023. Be part of a trailblazing team shaping the future of AI, ML, and cloud innovation. Your next big opportunity starts here! Experience Level

4+ years Employment Type

Full Time Location

Remote: Dallas, TX / Bedminster, NJ Role Overview

Quantiphi is seeking an experienced

Platform Engineer

with expertise in MLOps and handling distributed systems, particularly Kubernetes, and a strong background in managing Multi-GPU, Multi-Node Deep Learning job/inference scheduling. Proficiency in Linux (Ubuntu) systems, ability to create intricate shell scripts, experience with configuration management tools, and understanding of deep learning workflows. Key Responsibilities

Orchestrating LLM Workflows & Development: Design, implement, and scale the underlying platform that supports GenAI workloads, for real-time or batch processing. Workloads may include fine-tuning/distilling to inference. LLMOps (LLM Operations): Build and manage pipelines for training, fine-tuning, and deploying LLMs such as Llama, Mistral, GPT-3/4, BERT, or similar. Ensure smooth integration of these models into production. GPU Optimization: Optimize GPU utilization and resource management for AI workloads, enabling efficient scaling, low latency, and high throughput in training and inference. Manage multi-GPU systems and apply LLM parallelization techniques and other inference optimizations. Infrastructure Design & Automation: Design, deploy, and automate scalable, secure, and cost-effective infrastructure for training and running AI models. Work with cloud providers (AWS, GCP, Azure) to provision resources, implement auto-scaling, and manage distributed training environments. Platform Reliability & Monitoring: Implement robust monitoring to track performance, health, and efficiency of AI models and workflows. Troubleshoot in real-time and optimize system performance for seamless operations. ML/GenAI monitoring experience preferred. Maintain Knowledge Base: Knowledge of database concepts (performance tuning, RBAC, sharding) with exposure to relational, object, and vector databases preferred. Collaboration with AI/ML Teams: Work with data scientists, ML engineers, and product teams to meet platform requirements for AI model deployment and experimentation. Security & Compliance: Ensure platform infrastructure is secure and compliant with policies and best practices for AI model deployment. Basic Qualifications

3+ years of experience in platform engineering, DevOps, or systems engineering with a focus on machine learning and AI workloads. Experience with LLM workflows and GPU-based ML infrastructure. Hands-on experience managing distributed computing systems, training large-scale models, and deploying AI systems in cloud environments. Strong knowledge of GPU architectures (e.g., NVIDIA A100, V100), multi-GPU systems, and optimization techniques for AI workloads. Good to Have

Experience building or managing ML platforms for generative AI models or large-scale NLP tasks. Familiarity with distributed computing frameworks (e.g., Dask, MPI, PyTorch DDP) and data pipeline orchestration tools (e.g., AWS Glue, Apache Airflow). Knowledge of AI model deployment frameworks such as TensorFlow Serving, TorchServe, vLLM, Triton Inference Server. Understanding of LLM inference and self-managed infrastructure optimization. Understanding of AI model explainability, fairness, and ethical AI considerations. Experience automating and scaling deployment of AI models on a global infrastructure. Experience with NVIDIA ecosystem tools (Triton Inference Server, CUDA, NVAIE, TensorRT, NeMo, etc.). Proficiency with Kubernetes (GPU Operator), Linux, and AI deployment & experimentation tools. What’s in it for YOU at Quantiphi

Access to state-of-the-art GPU infrastructure on the cloud and on-premises. Be part of the fastest-growing AI-first digital transformation and engineering company in the world. Opportunity to work with Fortune 500 AI leaders and disruptors transforming their business with Generative AI. Strong peer learning and career growth in Applied AI and GPU Computing. Recognition as part of a team with NVIDIA AI Services Partner of the Year awards.

#J-18808-Ljbffr