Poseidon Research
Location:
Remote or New York City, US
Organization:
Poseidon Research
Compensation:
$100,000–$150,000 annually; or higher, depending on experience
Type:
One year contract
This position is funded through a charitable research grant.
Poseidon Research
is an independent AI safety laboratory based in New York City. Our mission is to make advanced AI systems
transparent, trustworthy, and governable
through deep technical research in interpretability, control, and secure monitoring.
We investigate
how models think, hide, and reason —from understanding encoded reasoning and steganography in reasoning models to building open‑source monitoring tools that preserve human oversight. Our research spans mechanistic interpretability, reinforcement learning, control, information theory, and cryptography, bridging the theoretical and the practical.
You could be a cog in a big lab and gamble with humanity’s future. Or you could
own your entire research platform
at Poseidon Research, pioneering the infrastructure needed to accelerate AI safety to build a safe, secure, and prosperous future.
The Role We are hiring a
Research Engineer
to implement and scale experiments studying encoded reasoning and steganography in modern reasoning models.
This is a hands‑on, highly technical position focused on experiment design, model evaluation, and engineering platforms.
You will collaborate closely with research scientists to turn conceptual ideas into reproducible systems by building pipelines, datasets, and model organisms that make opaque behaviors measurable and controllable.
Responsibilities We’re looking for a creative, rigorous engineer who loves to build in order to understand how safety issues intersect with reality. You will:
Implement and reproduce
prior work on encoded reasoning and steganography, extending it to current open‑weight reasoning models (e.g., DeepSeek‑R1 and V3, GPT‑OSS, QwQ).
Develop and maintain
modular experiment pipelines for evaluating steganography, encoded reasoning, and reward hacking.
Build and test
fine‑tuning workflows (SFT or RL‑based) to study emergent encoded reasoning and reward hacking behaviors.
Collaborate
with our research leads to design safety cases and control agenda monitoring mechanisms suitable for countering various types of unsafe chain of thought.
Extend interpretability infrastructure , including probing, feature ablation, and sparse autoencoder (SAE) analysis pipelines using frameworks like TransformerLens.
Engineer datasets and evaluation suites
for robust paraphrasing, steganography cover tasks, and monitoring robustness metrics.
Collaborate with scientists
to identify causal directions and larger‑scale mechanisms (via standard interp, DAS, MELBO, targeted LAT, and related methods) underlying encoded reasoning.
Ensure reproducibility through clean code, experiment tracking, and open‑source releases.
Contribute to research communication by preparing writeups, visualizations, and benchmark results for research vignettes and publications.
Ideal Candidate Core Technical Skills
Strong
Python
and
PyTorch
experience.
Experience with LLM experimentation
using frameworks such as Hugging Face Transformers, TransformerLens, or equivalent.
Building reproducible ML pipelines
including data preprocessing, logging, visualization, and evaluation.
RL fine‑tuning
or training small‑to‑mid‑scale models through frameworks like TRL, verl, OpenRLHF or equivalents.
Proficiency with experiment tracking tools
such as Weights & Biases or MLflow, and Git.
Active proficiency and/or intellectual curiosity
working with AI‑assisted coding and research tools such as Claude Code, Codex, Cursor, Roo, Cline or equivalents.
Nice to have
Familiarity with interpretability methods such as probing, activation patching, or feature attribution.
Understanding of encoded reasoning, steganography, or information‑theoretic approaches to model communication; or some background in formal cryptography, information theory, or offensive cybersecurity.
Experience with mechanistic interpretability such as feature visualization, direction ablation, SAEs, crosscoders, and circuit tracing.
Background in information security, control, or formal verification.
Prior publications.
Mindset
Excited by
deep technical challenges
with high
safety
implications.
Values
open science, clarity, and reproducibility .
Comfortable working in a small,
fast‑moving
research team with
high autonomy .
Conscientiousness, honesty, agentic disposition.
Why Join Poseidon Research?
Mission‑Driven Research:
Every project contributes directly to AI safety, transparency, and governance.
Ownership:
Lead your own research platform with mentorship, not micromanagement.
Interdisciplinary Collaboration:
We regularly work with top researchers from DeepMind, Anthropic, other AI safety startups, and academic partners.
Impact:
Develop techniques, open‑source tools and benchmarks that shape global standards for safe AI deployment. Our work has already been cited by Anthropic, DeepMind, Meta, Microsoft, and MILA.
Lean, fast, and serious:
We move quickly, publish openly, and care deeply about getting it right.
Application Please Include
A short research statement about what problems in AI safety interest you and how these intersect with Poseidon aims.
CV, and Google Scholar link if applicable.
Links to code or papers if not in CV.
Name Email Address Why do you want to work at Poseidon Research? #J-18808-Ljbffr
Remote or New York City, US
Organization:
Poseidon Research
Compensation:
$100,000–$150,000 annually; or higher, depending on experience
Type:
One year contract
This position is funded through a charitable research grant.
Poseidon Research
is an independent AI safety laboratory based in New York City. Our mission is to make advanced AI systems
transparent, trustworthy, and governable
through deep technical research in interpretability, control, and secure monitoring.
We investigate
how models think, hide, and reason —from understanding encoded reasoning and steganography in reasoning models to building open‑source monitoring tools that preserve human oversight. Our research spans mechanistic interpretability, reinforcement learning, control, information theory, and cryptography, bridging the theoretical and the practical.
You could be a cog in a big lab and gamble with humanity’s future. Or you could
own your entire research platform
at Poseidon Research, pioneering the infrastructure needed to accelerate AI safety to build a safe, secure, and prosperous future.
The Role We are hiring a
Research Engineer
to implement and scale experiments studying encoded reasoning and steganography in modern reasoning models.
This is a hands‑on, highly technical position focused on experiment design, model evaluation, and engineering platforms.
You will collaborate closely with research scientists to turn conceptual ideas into reproducible systems by building pipelines, datasets, and model organisms that make opaque behaviors measurable and controllable.
Responsibilities We’re looking for a creative, rigorous engineer who loves to build in order to understand how safety issues intersect with reality. You will:
Implement and reproduce
prior work on encoded reasoning and steganography, extending it to current open‑weight reasoning models (e.g., DeepSeek‑R1 and V3, GPT‑OSS, QwQ).
Develop and maintain
modular experiment pipelines for evaluating steganography, encoded reasoning, and reward hacking.
Build and test
fine‑tuning workflows (SFT or RL‑based) to study emergent encoded reasoning and reward hacking behaviors.
Collaborate
with our research leads to design safety cases and control agenda monitoring mechanisms suitable for countering various types of unsafe chain of thought.
Extend interpretability infrastructure , including probing, feature ablation, and sparse autoencoder (SAE) analysis pipelines using frameworks like TransformerLens.
Engineer datasets and evaluation suites
for robust paraphrasing, steganography cover tasks, and monitoring robustness metrics.
Collaborate with scientists
to identify causal directions and larger‑scale mechanisms (via standard interp, DAS, MELBO, targeted LAT, and related methods) underlying encoded reasoning.
Ensure reproducibility through clean code, experiment tracking, and open‑source releases.
Contribute to research communication by preparing writeups, visualizations, and benchmark results for research vignettes and publications.
Ideal Candidate Core Technical Skills
Strong
Python
and
PyTorch
experience.
Experience with LLM experimentation
using frameworks such as Hugging Face Transformers, TransformerLens, or equivalent.
Building reproducible ML pipelines
including data preprocessing, logging, visualization, and evaluation.
RL fine‑tuning
or training small‑to‑mid‑scale models through frameworks like TRL, verl, OpenRLHF or equivalents.
Proficiency with experiment tracking tools
such as Weights & Biases or MLflow, and Git.
Active proficiency and/or intellectual curiosity
working with AI‑assisted coding and research tools such as Claude Code, Codex, Cursor, Roo, Cline or equivalents.
Nice to have
Familiarity with interpretability methods such as probing, activation patching, or feature attribution.
Understanding of encoded reasoning, steganography, or information‑theoretic approaches to model communication; or some background in formal cryptography, information theory, or offensive cybersecurity.
Experience with mechanistic interpretability such as feature visualization, direction ablation, SAEs, crosscoders, and circuit tracing.
Background in information security, control, or formal verification.
Prior publications.
Mindset
Excited by
deep technical challenges
with high
safety
implications.
Values
open science, clarity, and reproducibility .
Comfortable working in a small,
fast‑moving
research team with
high autonomy .
Conscientiousness, honesty, agentic disposition.
Why Join Poseidon Research?
Mission‑Driven Research:
Every project contributes directly to AI safety, transparency, and governance.
Ownership:
Lead your own research platform with mentorship, not micromanagement.
Interdisciplinary Collaboration:
We regularly work with top researchers from DeepMind, Anthropic, other AI safety startups, and academic partners.
Impact:
Develop techniques, open‑source tools and benchmarks that shape global standards for safe AI deployment. Our work has already been cited by Anthropic, DeepMind, Meta, Microsoft, and MILA.
Lean, fast, and serious:
We move quickly, publish openly, and care deeply about getting it right.
Application Please Include
A short research statement about what problems in AI safety interest you and how these intersect with Poseidon aims.
CV, and Google Scholar link if applicable.
Links to code or papers if not in CV.
Name Email Address Why do you want to work at Poseidon Research? #J-18808-Ljbffr