Walmart
Director, Data Science Quality & LLM Judging Systems for Conversational Commerc
Walmart, Sunnyvale, California, United States, 94086
Director, Data Science
Quality & LLM Judging Systems for Conversational Commerce
Walmart's Next Gen Commerce team is building intelligent, agentic systems that transform how customers shop through conversation. As Director, Data Science
Quality & LLM Judging Systems for Conversational Commerce, you will lead a critical pillar under the Senior Director of Data Science - Agentic AI for Conversational Commerce. Your mission is to define how we measure the effectiveness of the conversational shopping agent and the tools it invokes, ensuring we evaluate quality with both rigor and scale. You will lead a team responsible for defining evaluation metrics, designing measurement methodologies, and executing cost-efficient evaluations. This includes combining traditional human-labeled approaches with advanced "LLM-as-a-judge" techniques. You will design prompt-based evaluation tasks, identify when human oversight is needed, and explore how to distill smaller LLMs to replicate human-like evaluation at scale. Beyond conversation quality, your scope includes evaluating the outputs of tools invoked by the agent, such as personalized recommendations, summary generation, or proactive suggestions, where traditional metric-based evaluations fall short and human judgment is required. This is a hands-on leadership role requiring sharp judgment, strong experimental thinking, and fluency in both LLM prompting and applied ML. You will work closely with modeling, product, and platform teams to ensure that measurement drives improvement, and that the agent's behaviors align with quality, safety, and relevance at every step. Responsibilities
Grow and lead a high-performing team of data scientists, fostering a culture of technical excellence, fast execution, and clear accountability.
Define evaluation strategy and success metrics for the conversational shopping agent and its tool outputs.
Develop scalable measurement methodologies combining human-labeled benchmarks, LLM-as-a-judge prompts, and automated pipelines.
Design and iterate on prompts that enable LLMs to perform structured evaluation tasks with high agreement to human judgment.
Explore cost-effective alternatives by generating synthetic training data and distilling smaller LLMs to perform specific judging tasks.
Establish quality review loops and integrate feedback from evaluations into model and product development.
Partner with engineering, and product teams to ensure metrics are well-instrumented and align with long-term objectives.
Drive tooling and process development to support reliable, reproducible, and efficient evaluation at scale.
Minimum Qualifications
8+ years of experience in data science or applied machine learning.
5+ years leading teams focused on model evaluation, experimentation, or NLP applications.
Deep experience with large language models, including prompt engineering, structured evaluation, and response grading.
Familiarity with both human annotation workflows and LLM-based evaluators.
Strong understanding of metric design, statistical evaluation methods, and A/B testing.
Ability to translate ambiguous quality goals into concrete, testable evaluation frameworks.
Excellent communication and cross-functional collaboration skills.
Preferred Qualifications
Advanced degree in Computer Science, Machine Learning, or related field.
Experience with conversational AI, tool-augmented agents, or retrieval-augmented generation.
Knowledge of efficient LLM adaptation techniques such as distillation, LoRA, or instruction tuning.
Familiarity with evaluating outputs where objective ground truth is undefined (e.g., personalization, summarization, recommendation).
Track record of influencing product quality through principled evaluation and measurement.
About Walmart Global Tech: Imagine working in an environment where one line of code can make life easier for hundreds of millions of people. That's what we do at Walmart Global Tech. We're a team of software engineers, data scientists, cybersecurity experts, and service professionals within the world's leading retailer who make an epic impact and are at the forefront of the next retail disruption. People are why we innovate, and people power our innovations. We are people-led and tech-empowered. We train our team in the skillsets of the future and bring in experts like you to help us grow. We have roles for those chasing their first opportunity as well as those looking for the opportunity that will define their career. Here, you can kickstart a great career in tech, gain new skills and experience for virtually every industry, or leverage your expertise to innovate at scale, impact millions and reimagine the future of retail. Walmart's culture is a competitive advantage, and it's fostered by being together. Working together in person allows us to collaborate, align quickly and innovate with greater speed. We use our campuses to create purposeful connection rooted in deepening understanding and investing in the development of our associates. Our global headquarters is in Bentonville, Arkansas, with primary hubs in the San Francisco Bay area and New York/New Jersey. Benefits: Beyond our great compensation package, you can receive incentive awards for your performance. Other great perks include 401(k) match, stock purchase plan, paid maternity and parental leave, PTO, multiple health plans, and much more. Equal Opportunity Employer: Walmart, Inc. is an Equal Opportunity Employer
By Choice. We believe we are best equipped to help our associates, customers and the communities we serve live better when we really know them. That means understanding, respecting and valuing diversity- unique styles, experiences, identities, ideas and opinions
while being inclusive of all people. You will also receive PTO and/or PPTO that can be used for vacation, sick leave, holidays, or other purposes. The amount you receive depends on your job classification and length of employment. It will meet or exceed the requirements of paid sick leave laws, where applicable. Live Better U is a Walmart-paid education benefit program for full-time and part-time associates in Walmart and Sam's Club facilities. Programs range from high school completion to bachelor's degrees, including English Language Learning and short-form certificates. Tuition, books, and fees are completely paid for by Walmart. Eligibility requirements apply to some benefits and may depend on your job classification and length of employment. Benefits are subject to change and may be subject to a specific plan or program terms. The annual salary range for this position is $169,000.00-$338,000.00. Additional compensation includes annual or quarterly performance bonuses. Additional compensation for certain positions may also include: Stock. Primary Location: 1395 Crossman Ave, Sunnyvale, CA 94089-1114, United States of America
Quality & LLM Judging Systems for Conversational Commerce
Walmart's Next Gen Commerce team is building intelligent, agentic systems that transform how customers shop through conversation. As Director, Data Science
Quality & LLM Judging Systems for Conversational Commerce, you will lead a critical pillar under the Senior Director of Data Science - Agentic AI for Conversational Commerce. Your mission is to define how we measure the effectiveness of the conversational shopping agent and the tools it invokes, ensuring we evaluate quality with both rigor and scale. You will lead a team responsible for defining evaluation metrics, designing measurement methodologies, and executing cost-efficient evaluations. This includes combining traditional human-labeled approaches with advanced "LLM-as-a-judge" techniques. You will design prompt-based evaluation tasks, identify when human oversight is needed, and explore how to distill smaller LLMs to replicate human-like evaluation at scale. Beyond conversation quality, your scope includes evaluating the outputs of tools invoked by the agent, such as personalized recommendations, summary generation, or proactive suggestions, where traditional metric-based evaluations fall short and human judgment is required. This is a hands-on leadership role requiring sharp judgment, strong experimental thinking, and fluency in both LLM prompting and applied ML. You will work closely with modeling, product, and platform teams to ensure that measurement drives improvement, and that the agent's behaviors align with quality, safety, and relevance at every step. Responsibilities
Grow and lead a high-performing team of data scientists, fostering a culture of technical excellence, fast execution, and clear accountability.
Define evaluation strategy and success metrics for the conversational shopping agent and its tool outputs.
Develop scalable measurement methodologies combining human-labeled benchmarks, LLM-as-a-judge prompts, and automated pipelines.
Design and iterate on prompts that enable LLMs to perform structured evaluation tasks with high agreement to human judgment.
Explore cost-effective alternatives by generating synthetic training data and distilling smaller LLMs to perform specific judging tasks.
Establish quality review loops and integrate feedback from evaluations into model and product development.
Partner with engineering, and product teams to ensure metrics are well-instrumented and align with long-term objectives.
Drive tooling and process development to support reliable, reproducible, and efficient evaluation at scale.
Minimum Qualifications
8+ years of experience in data science or applied machine learning.
5+ years leading teams focused on model evaluation, experimentation, or NLP applications.
Deep experience with large language models, including prompt engineering, structured evaluation, and response grading.
Familiarity with both human annotation workflows and LLM-based evaluators.
Strong understanding of metric design, statistical evaluation methods, and A/B testing.
Ability to translate ambiguous quality goals into concrete, testable evaluation frameworks.
Excellent communication and cross-functional collaboration skills.
Preferred Qualifications
Advanced degree in Computer Science, Machine Learning, or related field.
Experience with conversational AI, tool-augmented agents, or retrieval-augmented generation.
Knowledge of efficient LLM adaptation techniques such as distillation, LoRA, or instruction tuning.
Familiarity with evaluating outputs where objective ground truth is undefined (e.g., personalization, summarization, recommendation).
Track record of influencing product quality through principled evaluation and measurement.
About Walmart Global Tech: Imagine working in an environment where one line of code can make life easier for hundreds of millions of people. That's what we do at Walmart Global Tech. We're a team of software engineers, data scientists, cybersecurity experts, and service professionals within the world's leading retailer who make an epic impact and are at the forefront of the next retail disruption. People are why we innovate, and people power our innovations. We are people-led and tech-empowered. We train our team in the skillsets of the future and bring in experts like you to help us grow. We have roles for those chasing their first opportunity as well as those looking for the opportunity that will define their career. Here, you can kickstart a great career in tech, gain new skills and experience for virtually every industry, or leverage your expertise to innovate at scale, impact millions and reimagine the future of retail. Walmart's culture is a competitive advantage, and it's fostered by being together. Working together in person allows us to collaborate, align quickly and innovate with greater speed. We use our campuses to create purposeful connection rooted in deepening understanding and investing in the development of our associates. Our global headquarters is in Bentonville, Arkansas, with primary hubs in the San Francisco Bay area and New York/New Jersey. Benefits: Beyond our great compensation package, you can receive incentive awards for your performance. Other great perks include 401(k) match, stock purchase plan, paid maternity and parental leave, PTO, multiple health plans, and much more. Equal Opportunity Employer: Walmart, Inc. is an Equal Opportunity Employer
By Choice. We believe we are best equipped to help our associates, customers and the communities we serve live better when we really know them. That means understanding, respecting and valuing diversity- unique styles, experiences, identities, ideas and opinions
while being inclusive of all people. You will also receive PTO and/or PPTO that can be used for vacation, sick leave, holidays, or other purposes. The amount you receive depends on your job classification and length of employment. It will meet or exceed the requirements of paid sick leave laws, where applicable. Live Better U is a Walmart-paid education benefit program for full-time and part-time associates in Walmart and Sam's Club facilities. Programs range from high school completion to bachelor's degrees, including English Language Learning and short-form certificates. Tuition, books, and fees are completely paid for by Walmart. Eligibility requirements apply to some benefits and may depend on your job classification and length of employment. Benefits are subject to change and may be subject to a specific plan or program terms. The annual salary range for this position is $169,000.00-$338,000.00. Additional compensation includes annual or quarterly performance bonuses. Additional compensation for certain positions may also include: Stock. Primary Location: 1395 Crossman Ave, Sunnyvale, CA 94089-1114, United States of America