Harper

AI Research Engineer

Harper, San Francisco, California, United States, 94199

The MissionWe're building an AI-powered insurance brokerage that's transforming the $900 billion commercial insurance market. Fresh off our $8M seed round, we need an exceptional AI Research Engineer to push the boundaries of what's possible with AI agents in the insurance domain. You'll be at the intersection of cutting-edge research and practical implementation, directly improving the intelligence and reliability of our AI systems that are replacing pre-internet infrastructure.You'll own the research and development of our model evaluation, alignment, and inference systems, ensuring our AI agents deliver reliable, accurate, and domain-specific responses across complex insurance workflows. This includes building our "MLLM-as-a-Judge" evaluation infrastructure, implementing post-training workflows (RLHF, DPO, RLAIF), optimizing inference infrastructure for production scale, and establishing rigorous benchmarking processes for insurance-specific reasoning tasks.We're committed to "Staying REAL" with our AI systems - Reliable, Experience-focused, Accurate, and Low latency. You'll work directly with the CEO and CTO to rapidly experiment, evaluate, and deploy improvements to our AI agents. We live by "There is no try, there is just do" - we need researchers who ship production improvements, not just run experiments.Outcomes You'll DriveEvaluation & Benchmarking InfrastructureDesign and build our "MLLM-as-a-Judge" evaluation system for automated, scalable feedback on insurance domain tasksEstablish comprehensive benchmarking processes comparing foundation models on underwriting, risk assessment, and policy recommendation tasksCreate insurance-specific evaluation datasets with human annotations for supervised fine-tuning and reinforcement learningAnalyze model failure modes in insurance contexts and provide actionable improvements to the engineering teamBuild real-time monitoring systems to track model performance degradation in productionModel Alignment & Post-TrainingImplement and optimize post-training workflows including RLHF, DPO, GRPO, and RLAIF for insurance domain alignmentExperiment with inference-time alignment techniques (prompt engineering, RAG, ICL) to improve accuracy on insurance-specific queriesBuild training automation pipelines and dashboards for reproducible experiments and rapid iterationFine-tune models for critical insurance tasks: underwriter reasoning, risk assessment, policy matchingDevelop custom reward models specific to insurance domain requirementsInference Optimization & DeploymentDeploy and optimize models using state-of-the-art inference frameworks (vLLM, TensorRT-LLM, TGI)Implement advanced serving strategies including continuous batching, PagedAttention, and speculative decodingOptimize GPU memory utilization through quantization (INT8/INT4), KV-cache optimization, and tensor parallelismBuild multi-model serving infrastructure supporting model routing based on task complexityImplement model cascading strategies to balance latency, cost, and accuracyDesign and deploy A/B testing infrastructure for gradual model rolloutsCreate comprehensive monitoring for inference metrics: p50/p95/p99 latencies, throughput, GPU utilizationBuild automatic failover and model fallback systems for high availabilityResearch to Production PipelineTranslate latest research in LLM evaluation, alignment, and efficient inference into production-ready solutionsDevelop custom evaluation metrics for insurance-specific tasks (accuracy in premium calculations, regulatory compliance, risk assessment)Build automated performance regression testing for model updatesCreate feedback loops from production data to continuously improve model performanceImplement online learning capabilities for real-time model adaptationCross-functional ImpactPartner with platform engineers to integrate evaluation and inference systems into the AI Grid infrastructureCollaborate with product teams to define SLA requirements and success metrics for AI agentsWork with forward deployed engineers to understand latency constraints and optimization opportunitiesBuild cost models to optimize inference spend while maintaining quality standardsYou're Our Person IfCore RequirementsDeep understanding of transformer architectures and experience training/fine-tuning large language modelsHands-on experience with post-training techniques (RLHF, DPO, Constitutional AI, RLAIF)Proven track record building evaluation systems and working with human-annotated datasetsExperience with distributed training on multi-GPU/multi-node systemsStrong proficiency with PyTorch and ecosystem tools (Torchtune, FSDP, DeepSpeed)Experience deploying models to production with modern inference servers (vLLM, TensorRT-LLM, TGI, Triton)Experience translating research papers into production implementationsComfort with both research experimentation and production engineeringYou ship improvements daily and measure impact rigorouslyInference & Optimization ExpertiseDeep understanding of GPU architecture and CUDA programming for performance optimizationExperience with model quantization techniques (GPTQ, AWQ, SmoothQuant, INT8/INT4)Hands-on experience with efficient attention mechanisms (FlashAttention, PagedAttention, GQA, MQA)Knowledge of advanced serving optimizations (continuous batching, dynamic batching, request scheduling)Experience with model compression techniques (distillation, pruning, low-rank adaptation)Proficiency in profiling and optimizing GPU kernels for inference workloadsUnderstanding of distributed inference patterns (tensor parallelism, pipeline parallelism)Experience with edge deployment and model optimization for resource-constrained environmentsTechnical ExpertiseExperience building automated evaluation pipelines and "LLM-as-a-Judge" systemsUnderstanding of inference optimization techniques beyond quantization (speculative decoding, guided generation)Familiarity with RAG systems and context engineering for domain-specific applicationsExperience with experiment tracking and reproducible research infrastructureKnowledge of prompt engineering and few-shot learning techniquesUnderstanding of multimodal models and cross-modal alignment (nice to have)Experience with cost-performance optimization for large-scale inferenceMindset & ApproachYou default to action: run the experiment instead of debating the approachYou're comfortable with ambiguity and can define your own success metricsYou balance research rigor with practical impact and shipping velocityYou obsess over latency budgets and cost-per-token metricsYou can explain complex ML concepts to both technical and non-technical stakeholdersYou're excited about applying AI to transform a massive, traditional industryHard Requirements3+ years of experience in ML research or research engineering rolesProven track record of improving LLM performance through alignment techniquesDemonstrated experience deploying models to production at scaleExperience optimizing inference for latency-sensitive applicationsStrong understanding of GPU architecture and parallel computingExperience with Python and modern ML frameworksPublished research or significant open-source contributions (bonus)Experience with insurance or financial services domain (bonus)Must be based in San Francisco and work in-office 5.5 days per week (relocation assistance provided)Nice to HaveContributions to open-source inference frameworks (vLLM, TGI, etc.)Experience with custom CUDA kernel developmentKnowledge of compiler optimizations for ML workloadsExperience with federated learning or privacy-preserving ML techniquesBackground in formal verification or model safetyWhat We OfferOpportunity to define how AI transforms a $900 billion industryDirect impact on systems processing billions in insurance premiumsWork with cutting-edge AI technology and infrastructureExtremely high ownership and autonomyCompetitive compensation with significant equityRelocation assistance to San FranciscoThis position offers the unique opportunity to bridge cutting-edge AI research with massive real-world impact. You'll push the boundaries of what's possible with AI agents while ensuring they operate reliably at scale in a highly regulated industry. If you're passionate about making AI systems faster, smarter, and more reliable, we want to talk to you. #J-18808-Ljbffr