Storm3
Research Scientist - Vision, Multimodality
Storm3, San Francisco, California, United States, 94199
Research Scientist – Vision, Multimodality
Come join one of the only research institutions globally with resources to compete with top AI companies—10s of thousands of GPUs to explore state-of-the-art research in LLMs, multimodal, and agentic AI.
Currently seeking AI talent with expertise in multimodal AI, video/image generation, diffusion models, and visual understanding to develop the next breakthrough in physical and generative machine intelligence.
Base pay range $250,000.00/yr – $400,000.00/yr
⚡ Research Scientists/Engineers (all levels)
Responsibilities
Research & develop novel methods in Video/Image Generation and Multimodality
Contribute towards large-scale Vision data infrastructure and model training
Source, process and curate high quality image and video data
Publish breakthrough findings and reports in top conferences
Requirements
PhD in Comp Sci/ML/Maths preferred
Strong publication record in top conferences, or contribution to leading AI models—particularly in controllable generative models, world models or physical AI
2+ years industry experience focused on VLMs, diffusion, visual understanding, 2D image/video
Strong in navigating ambiguity and impacting research direction
Why apply
Opportunity to join a fast-growing core team that are already pushing AI breakthroughs
Highly competitive salary package
Work alongside ambitious and bright superstars from tech and academia
Medical, Dental and Vision Insurance
Interested in applying? Please click on the ‘Easy Apply’ button or alternatively email me your resume at
anir.gantugs@storm3.com
#J-18808-Ljbffr
Currently seeking AI talent with expertise in multimodal AI, video/image generation, diffusion models, and visual understanding to develop the next breakthrough in physical and generative machine intelligence.
Base pay range $250,000.00/yr – $400,000.00/yr
⚡ Research Scientists/Engineers (all levels)
Responsibilities
Research & develop novel methods in Video/Image Generation and Multimodality
Contribute towards large-scale Vision data infrastructure and model training
Source, process and curate high quality image and video data
Publish breakthrough findings and reports in top conferences
Requirements
PhD in Comp Sci/ML/Maths preferred
Strong publication record in top conferences, or contribution to leading AI models—particularly in controllable generative models, world models or physical AI
2+ years industry experience focused on VLMs, diffusion, visual understanding, 2D image/video
Strong in navigating ambiguity and impacting research direction
Why apply
Opportunity to join a fast-growing core team that are already pushing AI breakthroughs
Highly competitive salary package
Work alongside ambitious and bright superstars from tech and academia
Medical, Dental and Vision Insurance
Interested in applying? Please click on the ‘Easy Apply’ button or alternatively email me your resume at
anir.gantugs@storm3.com
#J-18808-Ljbffr