Harper

AI Research Engineer

Harper, San Francisco, California, United States, 94199

The Mission We're building an AI-powered insurance brokerage that's transforming the $900 billion commercial insurance market. Fresh off our $8M seed round, we need an exceptional AI Research Engineer to push the boundaries of what's possible with AI agents in the insurance domain. You'll be at the intersection of cutting-edge research and practical implementation, directly improving the intelligence and reliability of our AI systems that are replacing pre-internet infrastructure.

Please read the following job description thoroughly to ensure you are the right fit for this role before applying. You'll own the research and development of our model evaluation, alignment, and inference systems, ensuring our AI agents deliver reliable, accurate, and domain-specific responses across complex insurance workflows. This includes building our "MLLM-as-a-Judge" evaluation infrastructure, implementing post-training workflows (RLHF, DPO, RLAIF), optimizing inference infrastructure for production scale, and establishing rigorous benchmarking processes for insurance-specific reasoning tasks. We're committed to "Staying REAL" with our AI systems - Reliable, Experience-focused, Accurate, and Low latency. You'll work directly with the CEO and CTO to rapidly experiment, evaluate, and deploy improvements to our AI agents. We live by "There is no try, there is just do" - we need researchers who ship production improvements, not just run experiments. Outcomes You'll DriveEvaluation & Benchmarking Infrastructure

Design and build our "MLLM-as-a-Judge" evaluation system for automated, scalable feedback on insurance domain tasks Establish comprehensive benchmarking processes comparing foundation models on underwriting, risk assessment, and policy recommendation tasks Create insurance-specific evaluation datasets with human annotations for supervised fine-tuning and reinforcement learning Analyze model failure modes in insurance contexts and provide actionable improvements to the engineering team Build real-time monitoring systems to track model performance degradation in production Model Alignment & Post-Training

Implement and optimize post-training workflows including RLHF, DPO, GRPO, and RLAIF for insurance domain alignment Experiment with inference-time alignment techniques (prompt engineering, RAG, ICL) to improve accuracy on insurance-specific queries Build training automation pipelines and dashboards for reproducible experiments and rapid iteration Fine-tune models for critical insurance tasks: underwriter reasoning, risk assessment, policy matching Develop custom reward models specific to insurance domain requirements Inference Optimization & Deployment

Deploy and optimize models using state-of-the-art inference frameworks (vLLM, TensorRT-LLM, TGI) Implement advanced serving strategies including continuous batching, PagedAttention, and speculative decoding Optimize GPU memory utilization through quantization (INT8/INT4), KV-cache optimization, and tensor parallelism Build multi-model serving infrastructure supporting model routing based on task complexity Implement model cascading strategies to balance latency, cost, and accuracy Design and deploy A/B testing infrastructure for gradual model rollouts Create comprehensive monitoring for inference metrics: p50/p95/p99 latencies, throughput, GPU utilization Build automatic failover and model fallback systems for high availability Research to Production Pipeline

Translate latest research in LLM evaluation, alignment, and efficient inference into production-ready solutions Develop custom evaluation metrics for insurance-specific tasks (accuracy in premium calculations, regulatory compliance, risk assessment) Build automated performance regression testing for model updates Create feedback loops from production data to continuously improve model performance Implement online learning capabilities for real-time model adaptation Cross-functional Impact

Partner with platform engineers to integrate evaluation and inference systems into the AI Grid infrastructure Collaborate with product teams to define SLA requirements and success metrics for AI agents Work with forward deployed engineers to understand latency constraints and optimization opportunities Build cost models to optimize inference spend while maintaining quality standards You're Our Person IfCore Requirements

Deep understanding of transformer architectures and experience training/fine-tuning large language models Hands-on experience with post-training techniques (RLHF, DPO, Constitutional AI, RLAIF) Proven track record building evaluation systems and working with human-annotated datasets Experience with distributed training on multi-GPU/multi-node systems Strong proficiency with PyTorch and ecosystem tools (Torchtune, FSDP, DeepSpeed) Experience deploying models to production with modern inference servers (vLLM, TensorRT-LLM, TGI, Triton) Experience translating research papers into production implementations Comfort with both research experimentation and production engineering You ship improvements daily and measure impact rigorously Inference & Optimization Expertise

Deep understanding of GPU architecture and CUDA programming for performance optimization Experience with model quantization techniques (GPTQ, AWQ, SmoothQuant, INT8/INT4) Hands-on experience with efficient attention mechanisms (FlashAttention, PagedAttention, GQA, MQA) Knowledge of advanced serving optimizations (continuous batching, dynamic batching, request scheduling) Experience with model compression techniques (distillation, pruning, low-rank adaptation) Proficiency in profiling and optimizing GPU kernels for inference workloads Understanding of distributed inference patterns (tensor parallelism, pipeline parallelism) Experience with edge deployment and model optimization for resource-constrained environments Technical Expertise

Experience building automated evaluation pipelines and "LLM-as-a-Judge" systems Understanding of inference optimization techniques beyond quantization (speculative decoding, guided generation) Familiarity with RAG systems and context engineering for domain-specific applications Experience with experiment tracking and reproducible research infrastructure Knowledge of prompt engineering and few-shot learning techniques Understanding of multimodal models and cross-modal alignment (nice to have) Experience with cost-performance optimization for large-scale inference Mindset & Approach

You default to action: run the experiment instead of debating the approach You're comfortable with ambiguity and can define your own success metrics You balance research rigor with practical impact and shipping velocity You obsess over latency budgets and cost-per-token metrics You can explain complex ML concepts to both technical and non-technical stakeholders You're excited about applying AI to transform a massive, traditional industry Hard Requirements

3+ years of experience in ML research or research engineering roles Proven track record of improving LLM performance through alignment techniques Demonstrated experience deploying models to production at scale Experience optimizing inference for latency-sensitive applications Strong understanding of GPU architecture and parallel computing Experience with Python and modern ML frameworks Published research or significant open-source contributions (bonus) Experience with insurance or financial services domain (bonus) Must be based in San Francisco and work in-office 5.5 days per week (relocation assistance provided) Nice to Have

Contributions to open-source inference frameworks (vLLM, TGI, etc.) Experience with custom CUDA kernel development Knowledge of compiler optimizations for ML workloads Experience with federated learning or privacy-preserving ML techniques Background in formal verification or model safety What We Offer

Opportunity to define how AI transforms a $900 billion industry Direct impact on systems processing billions in insurance premiums Work with cutting-edge AI technology and infrastructure Extremely high ownership and autonomy Competitive compensation with significant equity Relocation assistance to San Francisco This position offers the unique opportunity to bridge cutting-edge AI research with massive real-world impact. You'll push the boundaries of what's possible with AI agents while ensuring they operate reliably at scale in a highly regulated industry. If you're passionate about making AI systems faster, smarter, and more reliable, we want to talk to you.

#J-18808-Ljbffr