Anthropic

Research Manager, Interpretability

Anthropic, San Francisco, California, United States, 94199

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.

About the Interpretability team When you see what modern language models are capable of, do you wonder, "How do these things work? How can we trust them?"

The Interpretability team’s mission is to reverse‑engineer how trained models work, and interpretability research is one of Anthropic’s core research bets on AI safety. We believe that a mechanistic understanding is the most robust way to make advanced systems safe.

We focus on mechanistic interpretability, which aims to discover how neural‑network parameters map to meaningful algorithms. Think of it as doing "biology" or "neuroscience" of neural networks, or treating neural networks as binary computer programs we are trying to reverse‑engineer.

Our work includes resolving the issue of superposition, identifying ways to decompose models into more interpretable components, and building circuits using features to understand the mechanisms of models like Claude 3.0 Sonnet and Claude Haiku 3.5.

As a manager on the Interpretability team, you will support a team of expert researchers and engineers, translating cutting‑edge research ideas into tangible goals and overseeing their execution.

Responsibilities

Partner with a research lead on direction, project planning and execution, hiring, and people development

Set and maintain a high bar for execution speed and quality, including identifying improvements to processes that help the team operate effectively

Coach and support team members to have more impact and develop in their careers

Drive the team's recruiting efforts, including hiring planning, process improvements, and sourcing and closing

Help identify and support opportunities for collaboration with other teams across Anthropic

Communicate team updates and results to other teams and leadership

Maintain a deep understanding of the team's technical work and its implications for AI safety

What you may be a good fit if you

Are an experienced manager (minimum 2-5 years) with a track record of effectively leading highly technical research and/or engineering teams

Have a background in machine learning, AI, or a related technical field

Actively enjoy people management and are experienced with coaching and mentorship, performance evaluation, career development, and hiring for technical roles

Have strong project management skills, including prioritization and cross‑functional coordination and collaboration

Have managed technical teams through periods of ambiguity and change

Are a quick learner, capable of understanding and contributing to discussions on complex technical topics and are motivated to learn about our research

Are a strong communicator both in speaking and in writing

Believe that advanced AI systems could have a transformative effect on the world, and are passionate about helping make sure that transformation goes well

Strong candidates may also have

Experience scaling engineering infrastructure

Experience working on open‑ended, exploratory research agendas aimed at foundational insights

Some familiarity with our work and mechanistic interpretability

Location Policy

This role is expected to be in our SF office for 3 days a week.

We expect all staff to be in one of our offices at least 25% of the time. Some roles may require more time in our offices.

The expected base compensation for this position is $340,000 - $425,000 USD. Our total compensation package for full‑time employees includes equity, benefits, and may include incentive compensation.

Logistics Education requirements: We require at least a Bachelor's degree in a related field or equivalent experience.

Visa sponsorship: We do sponsor visas! However, we are not able to successfully sponsor visas for every role and every candidate. If we make you an offer, we will make every reasonable effort to obtain a visa and retain an immigration lawyer to help with this.

We encourage you to apply even if you do not believe you meet every single qualification. Not all strong candidates will meet every single qualification as listed. Research shows that people who identify as being from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy, so we urge you not to exclude yourself prematurely and to submit an application if you're interested in this work.

How we're different We believe that the highest‑impact AI research will be big science. At Anthropic we work as a single cohesive team on just a few large‑scale research efforts, and we value impact—advancing our long‑term goals of steerable, trustworthy AI—over smaller, more specific puzzles. We view AI research as an empirical science, which has as much in common with physics and biology as with traditional efforts in computer science. We are an extremely collaborative group and host frequent research discussions to ensure that we are pursuing the highest‑impact work at any given time. We greatly value communication skills.

Come work with us Anthropic is a public benefit corporation headquartered in San Francisco. We offer competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and a lovely office space in which to collaborate with colleagues.

As set forth in Anthropic’s Equal Employment Opportunity policy, we do not discriminate on the basis of any protected group status under any applicable law.

#J-18808-Ljbffr