Scale AI, Inc.
Tech Lead Manager, Machine Learning Research Scientist- LLM Evals
Scale AI, Inc., San Francisco, California, United States, 94199
Overview
Scale is a leading data and evaluation partner for frontier AI companies, dedicated to improving the evaluation and benchmarking of large language models (LLMs). We build industry‑leading LLM evals that set new standards for model performance assessment and develop rigorous, scalable, and fair evaluation methodologies to drive the next generation of AI capabilities.
Responsibilities
Lead a talented team of research scientists and research engineers focused on developing and implementing novel evaluation methodologies, metrics, and benchmarks for LLMs.
Conduct research on the effectiveness and limitations of existing LLM evaluation techniques.
Design and develop new evaluation benchmarks covering instruction following, factuality, robustness, and fairness.
Communicate, collaborate, and build relationships with clients and peer teams to facilitate cross‑functional projects.
Collaborate with internal teams and external partners to refine metrics and create standardized evaluation protocols.
Implement scalable and reproducible evaluation pipelines using modern ML frameworks.
Publish research findings in top‑tier AI conferences and contribute to open‑source benchmarking initiatives.
Remain up‑to‑date on ongoing research, help solve technical challenges, and participate in design decisions.
Stay deeply involved in the research community, shaping trends and setting new ones.
Thrive in a high‑energy, fast‑paced startup environment and dedicate the time and effort needed to drive impactful results.
Qualifications
5+ years of hands‑on experience in large language model, NLP, and Transformer modeling, both in research and engineering development.
Track record of landing major research impacts in a fast‑paced environment.
Experience supporting and leading a team of research scientists and research engineers.
Excellent written and verbal communication skills.
Published research in machine learning at major conferences (NeurIPS, ICML, ICLR, ACL, EMNLP, CVPR, etc.) and/or journals.
Previous experience in a customer‑facing role.
Compensation Base salary range for full‑time positions in San Francisco, New York, and Seattle: $240,000 — $380,000 USD.
Compensation packages include base salary, equity, and benefits. The package may also include a commuter stipend. Benefits cover comprehensive health, dental, and vision coverage, retirement benefits, a learning and development stipend, generous PTO, and additional benefits as applicable.
Notes Please note: Our policy requires a 90‑day waiting period before reconsidering candidates for the same role.
About Us At Scale, our mission is to develop reliable AI systems for the world’s most important decisions. Our products provide high‑quality data and full‑stack technologies that power the world’s leading models and help enterprises and governments build, deploy, and oversee AI applications that deliver real impact. We work closely with industry leaders such as Meta, Cisco, DLA Piper, Mayo Clinic, Time Inc., the Government of Qatar, and U.S. government agencies including the Army and Air Force.
We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability status, gender identity or Veteran status.
We provide reasonable accommodations to applicants with physical and mental disabilities. If you need assistance or a reasonable accommodation during the application or recruiting process, please contact us at accommodations@scale.com. Please see the United States Department of Labor’s Known Your Rights poster for additional information.
We comply with the United States Department of Labor’s Pay Transparency provision.
We collect, retain, and use personal data for professional business purposes, including notifying you of job opportunities that may be of interest and sharing with our affiliates. We limit the data we collect to that necessary to manage applicant needs and comply with applicable laws. All information is treated according to our internal policies and privacy program. Please see our privacy policy for additional information.
#J-18808-Ljbffr
Responsibilities
Lead a talented team of research scientists and research engineers focused on developing and implementing novel evaluation methodologies, metrics, and benchmarks for LLMs.
Conduct research on the effectiveness and limitations of existing LLM evaluation techniques.
Design and develop new evaluation benchmarks covering instruction following, factuality, robustness, and fairness.
Communicate, collaborate, and build relationships with clients and peer teams to facilitate cross‑functional projects.
Collaborate with internal teams and external partners to refine metrics and create standardized evaluation protocols.
Implement scalable and reproducible evaluation pipelines using modern ML frameworks.
Publish research findings in top‑tier AI conferences and contribute to open‑source benchmarking initiatives.
Remain up‑to‑date on ongoing research, help solve technical challenges, and participate in design decisions.
Stay deeply involved in the research community, shaping trends and setting new ones.
Thrive in a high‑energy, fast‑paced startup environment and dedicate the time and effort needed to drive impactful results.
Qualifications
5+ years of hands‑on experience in large language model, NLP, and Transformer modeling, both in research and engineering development.
Track record of landing major research impacts in a fast‑paced environment.
Experience supporting and leading a team of research scientists and research engineers.
Excellent written and verbal communication skills.
Published research in machine learning at major conferences (NeurIPS, ICML, ICLR, ACL, EMNLP, CVPR, etc.) and/or journals.
Previous experience in a customer‑facing role.
Compensation Base salary range for full‑time positions in San Francisco, New York, and Seattle: $240,000 — $380,000 USD.
Compensation packages include base salary, equity, and benefits. The package may also include a commuter stipend. Benefits cover comprehensive health, dental, and vision coverage, retirement benefits, a learning and development stipend, generous PTO, and additional benefits as applicable.
Notes Please note: Our policy requires a 90‑day waiting period before reconsidering candidates for the same role.
About Us At Scale, our mission is to develop reliable AI systems for the world’s most important decisions. Our products provide high‑quality data and full‑stack technologies that power the world’s leading models and help enterprises and governments build, deploy, and oversee AI applications that deliver real impact. We work closely with industry leaders such as Meta, Cisco, DLA Piper, Mayo Clinic, Time Inc., the Government of Qatar, and U.S. government agencies including the Army and Air Force.
We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability status, gender identity or Veteran status.
We provide reasonable accommodations to applicants with physical and mental disabilities. If you need assistance or a reasonable accommodation during the application or recruiting process, please contact us at accommodations@scale.com. Please see the United States Department of Labor’s Known Your Rights poster for additional information.
We comply with the United States Department of Labor’s Pay Transparency provision.
We collect, retain, and use personal data for professional business purposes, including notifying you of job opportunities that may be of interest and sharing with our affiliates. We limit the data we collect to that necessary to manage applicant needs and comply with applicable laws. All information is treated according to our internal policies and privacy program. Please see our privacy policy for additional information.
#J-18808-Ljbffr