ExlService Holdings, Inc.
Locations Exl - Utah - UT (Work From Home)
Job Role Application Development-Applications Development Engineering
Experience (In Years) 9-12
Job Description Responsibilities Key Responsibilities
1. RAG Development & Optimization
Design and implement
Retrieval-Augmented Generation pipelines
to ground LLMs in enterprise or domain-specific data. Make strategic decisions on
chunking strategy ,
embedding models , and
retrieval mechanisms
to balance context precision, recall, and latency. Work with
vector databases
(Qdrant, Weaviate, pgvector, Pinecone) and
embedding frameworks
(OpenAI, Hugging Face, Instructor, etc.). Diagnose and iterate on challenges like
chunk size trade-offs ,
retrieval quality ,
context window limits , and
grounding accuracy —using structured evaluation and metrics.
2. Chatbot Quality & Evaluation Frameworks
Establish comprehensive
evaluation frameworks
for LLM applications, combining quantitative (BLEU, ROUGE, response time) and qualitative methods (human evaluation, LLM-as-a-judge, relevance, coherence, user satisfaction). Implement continuous monitoring and automated regression testing using tools like
LangSmith ,
LangFuse ,
Arize , or
custom evaluation harnesses . Identify and prevent quality degradation, hallucinations, or factual inconsistencies before production release. Collaborate with design and product to define
success metrics
and
user feedback loops
for ongoing improvement.
3. Guardrails, Safety & Responsible AI
Implement
multi-layered guardrails
across input validation, output filtering, prompt engineering, re-ranking, and abstention (“I don’t know”) strategies. Use frameworks such as
Guardrails AI ,
NeMo Guardrails , or
Llama Guard
to ensure compliance, safety, and brand integrity. Build
policy-driven safety systems
for handling sensitive data, user content, and edge cases with clear escalation paths. Balance
safety, user experience, and helpfulness , knowing when to block, rephrase, or gracefully decline responses.
4. Multi-Agent Systems & Orchestration
Design and operate
multi-agent workflows
using orchestration frameworks such as
LangGraph ,
AutoGen ,
CrewAI , or
Haystack . Coordinate routing logic, task delegation, and parallel vs. sequential agent execution to handle complex reasoning or multi-step tasks. Build observability and debugging tools for tracking agent interactions, performance, and cost optimization. Evaluate trade-offs around
latency, reliability, and scalability
in production-grade multi-agent environments.
Qualifications Minimum Qualifications
10+ years of experience in Data Science, Data Engineering, or Machine Learning. Bachelor’s Degree in Computer Science, Information Systems, or a related field. Proficiency in Python
(FastAPI, Flask, asyncio), GCP experience is good to have Demonstrated
hands-on RAG implementation experience
with specific tools, models, and evaluation metrics. Practical knowledge of
agentic frameworks
(LangGraph, LangChain) and
evaluation ecosystems
(LangFuse, LangSmith). Excellent
communication skills , proven ability to
collaborate cross-functionally , and a
low-ego, ownership-driven
work style.
Preferred / Good-to-Have Qualifications
Experience in
traditional AI/ML workflows
— e.g., model training, feature engineering, and deployment of ML models (scikit-learn, TensorFlow, PyTorch). Familiarity with
retrieval optimization ,
prompt tuning , and
tool-use evaluation . Background in
observability and performance profiling
for large-scale AI systems. Understanding of
security and privacy
principles for AI systems (PII redaction, authentication/authorization, RBAC) Exposure to
enterprise chatbot systems ,
LLMOps pipelines , and
continuous model evaluation
in production.
#J-18808-Ljbffr
Job Description Responsibilities Key Responsibilities
1. RAG Development & Optimization
Design and implement
Retrieval-Augmented Generation pipelines
to ground LLMs in enterprise or domain-specific data. Make strategic decisions on
chunking strategy ,
embedding models , and
retrieval mechanisms
to balance context precision, recall, and latency. Work with
vector databases
(Qdrant, Weaviate, pgvector, Pinecone) and
embedding frameworks
(OpenAI, Hugging Face, Instructor, etc.). Diagnose and iterate on challenges like
chunk size trade-offs ,
retrieval quality ,
context window limits , and
grounding accuracy —using structured evaluation and metrics.
2. Chatbot Quality & Evaluation Frameworks
Establish comprehensive
evaluation frameworks
for LLM applications, combining quantitative (BLEU, ROUGE, response time) and qualitative methods (human evaluation, LLM-as-a-judge, relevance, coherence, user satisfaction). Implement continuous monitoring and automated regression testing using tools like
LangSmith ,
LangFuse ,
Arize , or
custom evaluation harnesses . Identify and prevent quality degradation, hallucinations, or factual inconsistencies before production release. Collaborate with design and product to define
success metrics
and
user feedback loops
for ongoing improvement.
3. Guardrails, Safety & Responsible AI
Implement
multi-layered guardrails
across input validation, output filtering, prompt engineering, re-ranking, and abstention (“I don’t know”) strategies. Use frameworks such as
Guardrails AI ,
NeMo Guardrails , or
Llama Guard
to ensure compliance, safety, and brand integrity. Build
policy-driven safety systems
for handling sensitive data, user content, and edge cases with clear escalation paths. Balance
safety, user experience, and helpfulness , knowing when to block, rephrase, or gracefully decline responses.
4. Multi-Agent Systems & Orchestration
Design and operate
multi-agent workflows
using orchestration frameworks such as
LangGraph ,
AutoGen ,
CrewAI , or
Haystack . Coordinate routing logic, task delegation, and parallel vs. sequential agent execution to handle complex reasoning or multi-step tasks. Build observability and debugging tools for tracking agent interactions, performance, and cost optimization. Evaluate trade-offs around
latency, reliability, and scalability
in production-grade multi-agent environments.
Qualifications Minimum Qualifications
10+ years of experience in Data Science, Data Engineering, or Machine Learning. Bachelor’s Degree in Computer Science, Information Systems, or a related field. Proficiency in Python
(FastAPI, Flask, asyncio), GCP experience is good to have Demonstrated
hands-on RAG implementation experience
with specific tools, models, and evaluation metrics. Practical knowledge of
agentic frameworks
(LangGraph, LangChain) and
evaluation ecosystems
(LangFuse, LangSmith). Excellent
communication skills , proven ability to
collaborate cross-functionally , and a
low-ego, ownership-driven
work style.
Preferred / Good-to-Have Qualifications
Experience in
traditional AI/ML workflows
— e.g., model training, feature engineering, and deployment of ML models (scikit-learn, TensorFlow, PyTorch). Familiarity with
retrieval optimization ,
prompt tuning , and
tool-use evaluation . Background in
observability and performance profiling
for large-scale AI systems. Understanding of
security and privacy
principles for AI systems (PII redaction, authentication/authorization, RBAC) Exposure to
enterprise chatbot systems ,
LLMOps pipelines , and
continuous model evaluation
in production.
#J-18808-Ljbffr