LMArena
About the Role
LMArena is seeking Machine Learning Scientists to advance how we evaluate and understand AI models. You’ll design and analyze experiments that uncover what makes models useful, trustworthy and capable through human preference signals. Your work will contribute to the scientific foundations of understanding AI at scale.
Location & Type Location:
SF Bay Area/Remote
Type:
Full-Time
Why Join Us? LMArena, launched by researchers from UC Berkeley’s SkyLab, is an open platform that makes AI evaluation transparent and human‑centered. Trusted by leading organizations like Google, OpenAI, Meta and Microsoft, and with over a million monthly users, you’ll help shape the next generation of safe, aligned AI systems.
Responsibilities
Design and conduct experiments to evaluate AI model behavior across reasoning, style, robustness, and user preference dimensions.
Develop new metrics, methodologies, and evaluation protocols that go beyond traditional benchmarks.
Analyze large‑scale human voting and interaction data to uncover insights into model performance and user preferences.
Collaborate with engineers to implement and scale research findings into production systems.
Prototype and test research ideas rapidly, balancing rigor with iteration speed.
Author internal reports and external publications that contribute to the broader ML research community.
Partner with model providers to shape evaluation questions and support responsible model testing.
Contribute to the scientific integrity and transparency of the LMArena leaderboard and tools.
Requirements
PhD or equivalent research experience in Machine Learning, Natural Language Processing, Statistics, or a related field.
Strong understanding of LLMs and modern deep learning architectures (e.g., Transformers, diffusion models, reinforcement learning with human feedback).
Proficiency in Python and research libraries such as PyTorch, JAX, or TensorFlow.
Demonstrated ability to design and analyze experiments with statistical rigor.
Experience publishing research or working on open‑source projects in ML, NLP, or AI evaluation.
Comfortable working with real‑world usage data and designing metrics beyond standard benchmarks.
Ability to translate research questions into practical systems and collaborate across engineering and product teams.
Passion for open science, reproducibility, and community‑driven research.
What We Offer
Competitive salary, meaningful equity, and comprehensive healthcare coverage.
Opportunity to work on cutting‑edge AI with a small, mission‑driven team.
A culture that values transparency, trust, and community impact.
Potential for significant career growth in a high‑impact, global organization.
The chance to shape the future of AI evaluation and influence industry standards.
Job Function & Industry Job Function:
Research Services & AI Evaluation Industry:
Research Services
#J-18808-Ljbffr
Location & Type Location:
SF Bay Area/Remote
Type:
Full-Time
Why Join Us? LMArena, launched by researchers from UC Berkeley’s SkyLab, is an open platform that makes AI evaluation transparent and human‑centered. Trusted by leading organizations like Google, OpenAI, Meta and Microsoft, and with over a million monthly users, you’ll help shape the next generation of safe, aligned AI systems.
Responsibilities
Design and conduct experiments to evaluate AI model behavior across reasoning, style, robustness, and user preference dimensions.
Develop new metrics, methodologies, and evaluation protocols that go beyond traditional benchmarks.
Analyze large‑scale human voting and interaction data to uncover insights into model performance and user preferences.
Collaborate with engineers to implement and scale research findings into production systems.
Prototype and test research ideas rapidly, balancing rigor with iteration speed.
Author internal reports and external publications that contribute to the broader ML research community.
Partner with model providers to shape evaluation questions and support responsible model testing.
Contribute to the scientific integrity and transparency of the LMArena leaderboard and tools.
Requirements
PhD or equivalent research experience in Machine Learning, Natural Language Processing, Statistics, or a related field.
Strong understanding of LLMs and modern deep learning architectures (e.g., Transformers, diffusion models, reinforcement learning with human feedback).
Proficiency in Python and research libraries such as PyTorch, JAX, or TensorFlow.
Demonstrated ability to design and analyze experiments with statistical rigor.
Experience publishing research or working on open‑source projects in ML, NLP, or AI evaluation.
Comfortable working with real‑world usage data and designing metrics beyond standard benchmarks.
Ability to translate research questions into practical systems and collaborate across engineering and product teams.
Passion for open science, reproducibility, and community‑driven research.
What We Offer
Competitive salary, meaningful equity, and comprehensive healthcare coverage.
Opportunity to work on cutting‑edge AI with a small, mission‑driven team.
A culture that values transparency, trust, and community impact.
Potential for significant career growth in a high‑impact, global organization.
The chance to shape the future of AI evaluation and influence industry standards.
Job Function & Industry Job Function:
Research Services & AI Evaluation Industry:
Research Services
#J-18808-Ljbffr