Altimetrik
Get AI-powered advice on this job and more exclusive features.
Role Focus:
Building production-grade GenAI solutions with a heavy emphasis on
RAG pipelines, evaluation, and observability. Key Responsibilities Design and optimize Retrieval-Augmented Generation pipelines leveraging LLMs. Implement advanced retrieval strategies, reranking, and context management patterns to minimize hallucinations. Prompt Engineering: Develop system/tool prompts and function calling schemas. Implement prompt versioning with evaluation hooks for continuous improvement. Evaluation & Observability: Establish ground truth datasets and confusion metrics. Build evaluation harnesses (LLM-as-judge with human review). Track and monitor cost, latency, and quality metrics. Workflow Orchestration: Integrate pipelines with orchestration tools like
Airflow
or
Dagster
for batch and real-time workflows. Security & Privacy: Ensure proper handling of PII and enforce RBAC controls within retrieval and generation flows. Python Engineering: Deliver production-quality pipelines using
Python
with strong practical development skills. Required Skills & Qualifications Expertise in
prompt engineering
and function calling design. Hands-on with
LangChain, LlamaIndex, or equivalent frameworks . Strong background in
evaluation frameworks
(ground truth, model-as-judge, confusion/error metrics). Proficiency in
Python
with experience in building scalable, maintainable systems. Familiarity with
workflow orchestration
(Airflow, Dagster) in production environments. Knowledge of
security & privacy best practices
(PII handling, RBAC). Experience integrating with
cloud LLM providers
(Azure OpenAI, Bedrock, Vertex AI). Success Criteria A
production-ready prompt + RAG pipeline system
with measurable quality lift (accuracy/confusion metrics) and reduced hallucinations. A
repeatable evaluation harness
with automated metrics, human-in-the-loop reviews, and observability (cost, latency, drift detection). Demonstrated improvements in
retrieval accuracy, hallucination rate reduction, and overall GenAI reliability. Seniority level
Associate Employment type
Full-time Job function
Information Technology Software Development
#J-18808-Ljbffr
Building production-grade GenAI solutions with a heavy emphasis on
RAG pipelines, evaluation, and observability. Key Responsibilities Design and optimize Retrieval-Augmented Generation pipelines leveraging LLMs. Implement advanced retrieval strategies, reranking, and context management patterns to minimize hallucinations. Prompt Engineering: Develop system/tool prompts and function calling schemas. Implement prompt versioning with evaluation hooks for continuous improvement. Evaluation & Observability: Establish ground truth datasets and confusion metrics. Build evaluation harnesses (LLM-as-judge with human review). Track and monitor cost, latency, and quality metrics. Workflow Orchestration: Integrate pipelines with orchestration tools like
Airflow
or
Dagster
for batch and real-time workflows. Security & Privacy: Ensure proper handling of PII and enforce RBAC controls within retrieval and generation flows. Python Engineering: Deliver production-quality pipelines using
Python
with strong practical development skills. Required Skills & Qualifications Expertise in
prompt engineering
and function calling design. Hands-on with
LangChain, LlamaIndex, or equivalent frameworks . Strong background in
evaluation frameworks
(ground truth, model-as-judge, confusion/error metrics). Proficiency in
Python
with experience in building scalable, maintainable systems. Familiarity with
workflow orchestration
(Airflow, Dagster) in production environments. Knowledge of
security & privacy best practices
(PII handling, RBAC). Experience integrating with
cloud LLM providers
(Azure OpenAI, Bedrock, Vertex AI). Success Criteria A
production-ready prompt + RAG pipeline system
with measurable quality lift (accuracy/confusion metrics) and reduced hallucinations. A
repeatable evaluation harness
with automated metrics, human-in-the-loop reviews, and observability (cost, latency, drift detection). Demonstrated improvements in
retrieval accuracy, hallucination rate reduction, and overall GenAI reliability. Seniority level
Associate Employment type
Full-time Job function
Information Technology Software Development
#J-18808-Ljbffr