Logo
Poseidon Research

Research Engineer, Machine Learning

Poseidon Research, New York, New York, us, 10261

Save Job

Location:

Remote or New York City, US

Organization:

Poseidon Research

Compensation:

$100,000–$150,000 annually; or higher, depending on experience

Type:

One year contract

This position is funded through a charitable research grant.

Poseidon Research

is an independent AI safety laboratory based in New York City. Our mission is to make advanced AI systems

transparent, trustworthy, and governable

through deep technical research in interpretability, control, and secure monitoring.

We investigate

how models think, hide, and reason —from understanding encoded reasoning and steganography in reasoning models to building open‑source monitoring tools that preserve human oversight. Our research spans mechanistic interpretability, reinforcement learning, control, information theory, and cryptography, bridging the theoretical and the practical.

You could be a cog in a big lab and gamble with humanity’s future. Or you could

own your entire research platform

at Poseidon Research, pioneering the infrastructure needed to accelerate AI safety to build a safe, secure, and prosperous future.

The Role We are hiring a

Research Engineer

to implement and scale experiments studying encoded reasoning and steganography in modern reasoning models.

This is a hands‑on, highly technical position focused on experiment design, model evaluation, and engineering platforms.

You will collaborate closely with research scientists to turn conceptual ideas into reproducible systems by building pipelines, datasets, and model organisms that make opaque behaviors measurable and controllable.

Responsibilities We’re looking for a creative, rigorous engineer who loves to build in order to understand how safety issues intersect with reality. You will:

Implement and reproduce

prior work on encoded reasoning and steganography, extending it to current open‑weight reasoning models (e.g., DeepSeek‑R1 and V3, GPT‑OSS, QwQ).

Develop and maintain

modular experiment pipelines for evaluating steganography, encoded reasoning, and reward hacking.

Build and test

fine‑tuning workflows (SFT or RL‑based) to study emergent encoded reasoning and reward hacking behaviors.

Collaborate

with our research leads to design safety cases and control agenda monitoring mechanisms suitable for countering various types of unsafe chain of thought.

Extend interpretability infrastructure , including probing, feature ablation, and sparse autoencoder (SAE) analysis pipelines using frameworks like TransformerLens.

Engineer datasets and evaluation suites

for robust paraphrasing, steganography cover tasks, and monitoring robustness metrics.

Collaborate with scientists

to identify causal directions and larger‑scale mechanisms (via standard interp, DAS, MELBO, targeted LAT, and related methods) underlying encoded reasoning.

Ensure reproducibility through clean code, experiment tracking, and open‑source releases.

Contribute to research communication by preparing writeups, visualizations, and benchmark results for research vignettes and publications.

Ideal Candidate Core Technical Skills

Strong

Python

and

PyTorch

experience.

Experience with LLM experimentation

using frameworks such as Hugging Face Transformers, TransformerLens, or equivalent.

Building reproducible ML pipelines

including data preprocessing, logging, visualization, and evaluation.

RL fine‑tuning

or training small‑to‑mid‑scale models through frameworks like TRL, verl, OpenRLHF or equivalents.

Proficiency with experiment tracking tools

such as Weights & Biases or MLflow, and Git.

Active proficiency and/or intellectual curiosity

working with AI‑assisted coding and research tools such as Claude Code, Codex, Cursor, Roo, Cline or equivalents.

Nice to have

Familiarity with interpretability methods such as probing, activation patching, or feature attribution.

Understanding of encoded reasoning, steganography, or information‑theoretic approaches to model communication; or some background in formal cryptography, information theory, or offensive cybersecurity.

Experience with mechanistic interpretability such as feature visualization, direction ablation, SAEs, crosscoders, and circuit tracing.

Background in information security, control, or formal verification.

Prior publications.

Mindset

Excited by

deep technical challenges

with high

safety

implications.

Values

open science, clarity, and reproducibility .

Comfortable working in a small,

fast‑moving

research team with

high autonomy .

Conscientiousness, honesty, agentic disposition.

Why Join Poseidon Research?

Mission‑Driven Research:

Every project contributes directly to AI safety, transparency, and governance.

Ownership:

Lead your own research platform with mentorship, not micromanagement.

Interdisciplinary Collaboration:

We regularly work with top researchers from DeepMind, Anthropic, other AI safety startups, and academic partners.

Impact:

Develop techniques, open‑source tools and benchmarks that shape global standards for safe AI deployment. Our work has already been cited by Anthropic, DeepMind, Meta, Microsoft, and MILA.

Lean, fast, and serious:

We move quickly, publish openly, and care deeply about getting it right.

Application Please Include

A short research statement about what problems in AI safety interest you and how these intersect with Poseidon aims.

CV, and Google Scholar link if applicable.

Links to code or papers if not in CV.

Name Email Address Why do you want to work at Poseidon Research? #J-18808-Ljbffr