Robots and Pencils
AI Engineer (AI System Calibration & Optimization)
Robots and Pencils, Seattle, Washington, us, 98127
Location
Seattle, WA (Remote Friendly)
Position Robots & Pencils is seeking an outcome-oriented AI Engineer to partner with a strategic client on a high-impact AI system calibration and optimization engagement. You’ll embed directly with the client’s AI and product engineering teams to improve the accuracy, reliability, and transparency of their Azure-hosted, fine-tuned GPT model through systematic prompt optimization and RAG calibration.
Role Overview As an AI Engineer, you’ll serve as technical thought partner, actively coding and leveraging your software engineering experience to build calibration pipelines, optimize prompts using prompt optimization frameworks, and establish repeatable improvement workflows. You’ll work on-site with the client, driving measurable outcomes that maximize their AI system performance.
Key Responsibilities Client Engagement & Solution Development
Embedwith strategicclientas their technical partner for AI system calibration and prompt optimization.
Build production-grade calibration systems using Python within the client's Azure environment.
ImplementDSPyframework andGEPAoptimizer to systematically improve prompt quality and retrieval performance.
Create evaluation frameworks to measure model accuracy, precision/recall, latency, and hallucination rates.
Architect promptoptimizationpipelines for retrieval, context synthesis, and answer generation tailored to client needs.
Own the path to production -evaluationpipelines, Azure ML workflows, KPI dashboards, and optimization automation.
Iterate rapidly based on client feedback and KPI results, translate business goals into technical calibration improvements.
Outcome Ownership & Business Impact
Ownend-to-end delivery of calibration systems frominitialbaseline to production-ready optimization workflows.
Establish measurable KPIs anddemonstrateaccuracy improvements, latency reduction, and hallucination mitigation.
Providestrategic guidance on RAG architecture improvements and retrieval parameter optimization.
Accelerate client time-to-value through hands-on development and comprehensive knowledge transfer.
Deliver operational playbooks and documentation enabling the client team tomaintaincalibration systems independently.
Leadcomplex, multi-stakeholder calibration initiatives on-site and remotely; drive clarity, remove blockers, and keep execution on track.
Set coding standards and architectural patterns for calibration components; write clear docs, runbooks, and technical specifications.
Mentor client engineers through code reviews, pairing sessions, and technical workshops onDSPy, GEPA, andevaluationbest practices.
Make sound tradeoffs under real-world constraints - Azure cost optimization, data quality, performance requirements, and security.
Align delivery with Robots &Pencils'responsible AI practices and client governance requirements.
Cross-Functional Collaboration
Workclosely withclient'sAI SMEs and product engineering teams to understand product catalog structure and validation workflows.
Collaborate with internal R&Pproduct, engineering, and delivery teams on calibrationmethodologyand best practices.
Share insights from client engagement to improve R&P's prompt optimization frameworks and tooling.
Contribute reusable patterns, evaluation frameworks, and documentation back to R&P's core platform.
Collaborate across time zones with distributed teams.
Required Skills & Qualifications
Bachelor's degree in computer science,Engineering, or equivalent experience.
7+ years of professional software development with significant ownership of architecture and delivery.
3+ years of Python in ML/AI systems witha strongfocus on data processing and evaluation pipelines.
2+ years building with Generative AI including hands-on prompt engineering and optimization work.
Experience with prompt optimization frameworks -DSPystrongly preferred, or similar systematic approaches to prompt improvement.
Deep understanding of RAG architectures - retrieval quality, latency/cost tuning, hallucination mitigation, and evaluation methods.
Hands-on experience designing evaluation metrics and building assessment frameworks for LLM systems.
Knowledge of systematic experimentation methods - A/B testing, parameter tuning, performance benchmarking.
Experience with data curation, labeling workflows, and dataset quality management for AI systems.
Strong Azure cloud experience with focus on AI/ML services - Azure Machine Learning, Azure AI Search, Azure OpenAI Service.
Experience with Azure Data Labeling, Azure Blob Storage, and Azure infrastructure fundamentals.
Understandingvector search platforms and retrieval optimization (Azure AI Search,Weaviate,Qdrant, Pinecone).
StrongIaCbackground (Terraform or ARM templates) plus containerization and distributedsystemsknowledge.
Solid SDLC practices - testing strategies, CI/CD, code reviews, observability, and operational excellence.
Upper-intermediate English for client communication.
Experience leading complex technical projects with multiple stakeholders.
Strong communicationskills for technical and executive audiences.
Ability to context-switch and adapt to client environments.
Willingness to travel to client sites.
Nice to Have
Directhands-on experience withDSPyframework and GEPA optimizer.
Understandingsystematic optimization principles: evolutionary algorithms, Bayesian optimization, multi-objectiveoptimization, and Pareto efficiency concepts.
Familiarity with prompt optimization frameworks and methods - experience with any of: MIPROv2,TextGrad,EvoPrompt,AutoPrompt, or reinforcement learning approaches (GRPO, PPO).
Experience with LLM-as-judge patterns and automated evaluation pipelines.
Knowledge of advanced RAG patterns - Adaptive RAG, Self-RAG, Corrective RAG - and retrieval evaluation methods (MRR, NDCG,precision@k).
Understanding of agentic AI patterns -ReAct, Chain-of-Thought, Tool Use - and their application in RAG systems.
Experience building evaluation dashboards with Azure Monitor, Application Insights, or similar observability tools.
Familiarity withMLOpspractices - model versioning, experiment tracking, metric logging for evaluation systems.
Experience with AWS or GCP AI/ML platforms (Bedrock, SageMaker, Vertex AI) and cross-cloud architecture patterns.
Experience with product catalog systems, cross-reference matching, or e-commerce search optimization.
Background in manufacturing, industrial equipment, or technical specification systems.
Prior consulting or professional services experience with enterprise clients.
Accountability
– Ownsfullclientengagementcyclewith quality, reliability, and attention to detail.
Adaptability
– Thrives in dynamic, fast-paced client environments.
Collaboration
– Builds strong partnerships across teams and time zones.
Execution-Focused
– Delivers maintainable, scalable solutions without overengineering.
Innovation-Minded
– Brings curiosity and experimentation to technology decisions.
Craftsmanship
– Cares deeply aboutdocumentation andcode quality, architecture, and user experience.
Why Join Robots & Pencils? We don’t just ship features; we build digital-first products that matter. As a Senior Forward Deploy Engineer, you’ll join a team that values deep craft, cross-functional collaboration, and relentless focus on quality. You’ll work on impactful agentic AI applications using modern technologies, while influencing engineering culture and best practices across the organization.
#J-18808-Ljbffr
Position Robots & Pencils is seeking an outcome-oriented AI Engineer to partner with a strategic client on a high-impact AI system calibration and optimization engagement. You’ll embed directly with the client’s AI and product engineering teams to improve the accuracy, reliability, and transparency of their Azure-hosted, fine-tuned GPT model through systematic prompt optimization and RAG calibration.
Role Overview As an AI Engineer, you’ll serve as technical thought partner, actively coding and leveraging your software engineering experience to build calibration pipelines, optimize prompts using prompt optimization frameworks, and establish repeatable improvement workflows. You’ll work on-site with the client, driving measurable outcomes that maximize their AI system performance.
Key Responsibilities Client Engagement & Solution Development
Embedwith strategicclientas their technical partner for AI system calibration and prompt optimization.
Build production-grade calibration systems using Python within the client's Azure environment.
ImplementDSPyframework andGEPAoptimizer to systematically improve prompt quality and retrieval performance.
Create evaluation frameworks to measure model accuracy, precision/recall, latency, and hallucination rates.
Architect promptoptimizationpipelines for retrieval, context synthesis, and answer generation tailored to client needs.
Own the path to production -evaluationpipelines, Azure ML workflows, KPI dashboards, and optimization automation.
Iterate rapidly based on client feedback and KPI results, translate business goals into technical calibration improvements.
Outcome Ownership & Business Impact
Ownend-to-end delivery of calibration systems frominitialbaseline to production-ready optimization workflows.
Establish measurable KPIs anddemonstrateaccuracy improvements, latency reduction, and hallucination mitigation.
Providestrategic guidance on RAG architecture improvements and retrieval parameter optimization.
Accelerate client time-to-value through hands-on development and comprehensive knowledge transfer.
Deliver operational playbooks and documentation enabling the client team tomaintaincalibration systems independently.
Leadcomplex, multi-stakeholder calibration initiatives on-site and remotely; drive clarity, remove blockers, and keep execution on track.
Set coding standards and architectural patterns for calibration components; write clear docs, runbooks, and technical specifications.
Mentor client engineers through code reviews, pairing sessions, and technical workshops onDSPy, GEPA, andevaluationbest practices.
Make sound tradeoffs under real-world constraints - Azure cost optimization, data quality, performance requirements, and security.
Align delivery with Robots &Pencils'responsible AI practices and client governance requirements.
Cross-Functional Collaboration
Workclosely withclient'sAI SMEs and product engineering teams to understand product catalog structure and validation workflows.
Collaborate with internal R&Pproduct, engineering, and delivery teams on calibrationmethodologyand best practices.
Share insights from client engagement to improve R&P's prompt optimization frameworks and tooling.
Contribute reusable patterns, evaluation frameworks, and documentation back to R&P's core platform.
Collaborate across time zones with distributed teams.
Required Skills & Qualifications
Bachelor's degree in computer science,Engineering, or equivalent experience.
7+ years of professional software development with significant ownership of architecture and delivery.
3+ years of Python in ML/AI systems witha strongfocus on data processing and evaluation pipelines.
2+ years building with Generative AI including hands-on prompt engineering and optimization work.
Experience with prompt optimization frameworks -DSPystrongly preferred, or similar systematic approaches to prompt improvement.
Deep understanding of RAG architectures - retrieval quality, latency/cost tuning, hallucination mitigation, and evaluation methods.
Hands-on experience designing evaluation metrics and building assessment frameworks for LLM systems.
Knowledge of systematic experimentation methods - A/B testing, parameter tuning, performance benchmarking.
Experience with data curation, labeling workflows, and dataset quality management for AI systems.
Strong Azure cloud experience with focus on AI/ML services - Azure Machine Learning, Azure AI Search, Azure OpenAI Service.
Experience with Azure Data Labeling, Azure Blob Storage, and Azure infrastructure fundamentals.
Understandingvector search platforms and retrieval optimization (Azure AI Search,Weaviate,Qdrant, Pinecone).
StrongIaCbackground (Terraform or ARM templates) plus containerization and distributedsystemsknowledge.
Solid SDLC practices - testing strategies, CI/CD, code reviews, observability, and operational excellence.
Upper-intermediate English for client communication.
Experience leading complex technical projects with multiple stakeholders.
Strong communicationskills for technical and executive audiences.
Ability to context-switch and adapt to client environments.
Willingness to travel to client sites.
Nice to Have
Directhands-on experience withDSPyframework and GEPA optimizer.
Understandingsystematic optimization principles: evolutionary algorithms, Bayesian optimization, multi-objectiveoptimization, and Pareto efficiency concepts.
Familiarity with prompt optimization frameworks and methods - experience with any of: MIPROv2,TextGrad,EvoPrompt,AutoPrompt, or reinforcement learning approaches (GRPO, PPO).
Experience with LLM-as-judge patterns and automated evaluation pipelines.
Knowledge of advanced RAG patterns - Adaptive RAG, Self-RAG, Corrective RAG - and retrieval evaluation methods (MRR, NDCG,precision@k).
Understanding of agentic AI patterns -ReAct, Chain-of-Thought, Tool Use - and their application in RAG systems.
Experience building evaluation dashboards with Azure Monitor, Application Insights, or similar observability tools.
Familiarity withMLOpspractices - model versioning, experiment tracking, metric logging for evaluation systems.
Experience with AWS or GCP AI/ML platforms (Bedrock, SageMaker, Vertex AI) and cross-cloud architecture patterns.
Experience with product catalog systems, cross-reference matching, or e-commerce search optimization.
Background in manufacturing, industrial equipment, or technical specification systems.
Prior consulting or professional services experience with enterprise clients.
Accountability
– Ownsfullclientengagementcyclewith quality, reliability, and attention to detail.
Adaptability
– Thrives in dynamic, fast-paced client environments.
Collaboration
– Builds strong partnerships across teams and time zones.
Execution-Focused
– Delivers maintainable, scalable solutions without overengineering.
Innovation-Minded
– Brings curiosity and experimentation to technology decisions.
Craftsmanship
– Cares deeply aboutdocumentation andcode quality, architecture, and user experience.
Why Join Robots & Pencils? We don’t just ship features; we build digital-first products that matter. As a Senior Forward Deploy Engineer, you’ll join a team that values deep craft, cross-functional collaboration, and relentless focus on quality. You’ll work on impactful agentic AI applications using modern technologies, while influencing engineering culture and best practices across the organization.
#J-18808-Ljbffr