Vmax
V max is an applied research lab working at the frontier of reinforcement learning (RL). We are building new techniques for leveraging RL with Large Language Models (LLMs). Our research contributes directly to our RL platform, which automates the engineering involved in converting data and evals into RL environments.
About the role Your objective will be to rapidly deliver bespoke environments and agents for our customers. It will be your responsibility to translate customer needs into bespoke environments and then post-train agents within these environments. You will also shape our product and research directions, helping us productize our research and make RL more widely accessible.
Responsibilities
Build RL environments for our customers
Post train LLM-based agents on domain specific tasks
Productizing Vmax research - apply environment generation and automated RL research to improve our customers' agents
Role Requirements
Experience post-training LLMs
Software engineering experience beyond research projects
Can independently build post-training data and training pipelines
Nice to have
Research experience in RL
Open source contributions to RL frameworks
Role specific location policy
This role is based in our San Francisco office; For exceptional candidates we are willing to consider a hybrid arrangement
Compensation The expected salary range for this position is $250,000 - $450,000 USD
#J-18808-Ljbffr
About the role Your objective will be to rapidly deliver bespoke environments and agents for our customers. It will be your responsibility to translate customer needs into bespoke environments and then post-train agents within these environments. You will also shape our product and research directions, helping us productize our research and make RL more widely accessible.
Responsibilities
Build RL environments for our customers
Post train LLM-based agents on domain specific tasks
Productizing Vmax research - apply environment generation and automated RL research to improve our customers' agents
Role Requirements
Experience post-training LLMs
Software engineering experience beyond research projects
Can independently build post-training data and training pipelines
Nice to have
Research experience in RL
Open source contributions to RL frameworks
Role specific location policy
This role is based in our San Francisco office; For exceptional candidates we are willing to consider a hybrid arrangement
Compensation The expected salary range for this position is $250,000 - $450,000 USD
#J-18808-Ljbffr