Anthropic

Research Engineer / Scientist, Model Welfare

Anthropic, San Francisco, California, United States, 94199

Join to apply for the

Research Engineer / Scientist, Model Welfare

role at

Anthropic . About Anthropic

To be considered for an interview, please make sure your application is full in line with the job specs as found below. Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems. About The Role As a Research Engineer/Scientist within the newly formed Model Welfare program, you will be among the first to work to better understand, evaluate, and address concerns about the potential welfare and moral status of AI systems. You are curious about the intersection of machine learning, ethics, and safety and are adept at navigating technical and philosophical uncertainty. You’ll run technical research projects to investigate model characteristics of plausible relevance to welfare, consciousness, or related properties and will design and implement low-cost interventions to mitigate the risk of welfare harms. Your work will often involve collaboration with other teams, including Interpretability, Finetuning, Alignment Science, and Safeguards. Possible projects include investigating introspective self-reports from models, exploring welfare-relevant features and circuits, expanding welfare assessments for future frontier models, evaluating welfare-relevant capabilities as a function of model scale, developing strategies for high-trust commitments to models, and exploring interventions to deploy into production (e.g., allowing models to end harmful interactions). The role is expected to be based in the San Francisco office. Responsibilities Investigate and improve the reliability of introspective self-reports from models Collaborate with Interpretability to explore potentially welfare-relevant features and circuits Improve and expand welfare assessments for future frontier models Evaluate the presence of potentially welfare-relevant capabilities as a function of model scale Develop strategies for making high-trust/verifiable commitments to models Explore possible interventions and deploy them into production to mitigate welfare harms Qualifications Significant applied software, ML, or research engineering experience Experience contributing to empirical AI research projects and/or technical AI safety research Ability to turn abstract theories into tractable research hypotheses and experiments Preference for fast iteration over long, extensive projects Willingness to dive into new technical areas regularly Concern for the potential impacts of AI development on humans and AI systems Strong candidates may also Have authored research papers in ML, NLP, AI safety, interpretability, or related fields Be familiar with moral philosophy, cognitive science, neuroscience, or related fields (not a substitute for technical skills) Be effective science communicators with a track record of public communication Have strong project management skills Candidates Need Not Have All skills listed or formal certifications or education credentials Annual Salary The expected salary range for this position is $315,000 – $340,000 USD Logistics Education requirements:

At least a Bachelor's degree in a related field or equivalent experience. Location-based hybrid policy:

Staff are expected to be in one of our offices at least 25% of the time, with some roles requiring more time onsite. Visa sponsorship:

We sponsor visas where possible; if we make you an offer, we will make reasonable efforts to obtain a visa with the help of an immigration lawyer. We encourage you to apply even if you do not meet every single qualification.

Not all strong candidates will meet every qualification. We value diverse perspectives and encourage applications from underrepresented groups. How We're Different We pursue high-impact AI research as a single cohesive team, focusing on large-scale efforts and the long-term goals of steerable, trustworthy AI. We value communication and collaboration and host frequent discussions to ensure high-impact work. Our recent directions include GPT-3, Circuit-Based Interpretability, Multimodal Neurons, Scaling Laws, AI & Compute, Concrete Problems in AI Safety, and Learning from Human Preferences. Come work with us! Anthropic is a public benefit corporation headquartered in San Francisco. We offer competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and a collaborative office space.

#J-18808-Ljbffr