Hapiko

Senior+ Data Scientist - ML & Image Generation

Hapiko, New York, New York, us, 10261

Founded by Arun Gupta (former CEO of Grailed, which sold to GOAT Group in 2022) and Bob Whitney (Anthropic, NYT Games), we're on a mission to create safe, hands‑on AI experiences that fuel kids' imaginations rather than replace them.

Our first product, Stickerbox, is the world’s first voice‑to‑sticker printer. A device that instantly transforms a child’s spoken ideas into printable, colorable stickers. We sold out our first run shipping for the holidays, and it’s already being called "one of the first products to make AI feel magical for kids and grounded for parents."

We have a $7M funding round led by Maveron (backers of Lovevery), Serena Ventures, and Ai2 (The Allen Institute). Stickerbox is bringing imagination to life for kids nationwide!

Why are we hiring? The technical challenge is real.

We’re running real‑time audio transcription, proprietary content safety systems, and custom image generation, all serving thousands of concurrent users with sub‑second latency. We’re training our own models from scratch, optimizing for kid‑friendly aesthetics, and building safety guardrails that actually work. We need a Data Scientist to own data quality, evaluation, and ML optimization across this entire pipeline. You’ll work with the team to define what to train on, how to measure success, and how to make our models better every day.

What you’ll do As our first Data Science hire, you’ll collaborate with us on:

Model Training & Data

Build and curate large‑scale image datasets for training custom models

Design annotation pipelines and data quality processes

Analyze training runs and model outputs to guide iteration

Work with our team to define what to train on and how to evaluate it

ML Pipeline Optimization

Optimize our transcription pipeline for accuracy and latency

Improve image generation quality, prompt adherence, and consistency

Identify bottlenecks and failure modes across the pipeline

Run experiments and A/B tests to measure improvements

Safety & Content Moderation

Refine content safety systems for child‑appropriate outputs, and develop new ones

Build on our evaluation datasets for safety edge cases

Analyze moderation performance and reduce false positives/negatives

Stay current on best practices for AI safety in generative systems

Evaluation & Metrics

Build evaluation frameworks to measure model performance at scale

Define metrics that correlate with user satisfaction (aesthetic quality, relevance, safety)

Develop automated evaluation pipelines (LLM‑as‑judge, CLIP scores, human eval)

Track experiments and communicate findings to the team

Prompt Engineering

Optimize prompts for transcription accuracy and image generation quality

Develop systematic approaches to prompt testing and iteration

Build prompt templates and guidelines for different use cases

What we're looking for

5+ years in data science or applied ML

Experience optimizing production ML systems

Strong statistical and analytical skills

Familiarity with LLMs and image generation models

Python proficiency; comfortable with PyTorch

Experience building evaluation frameworks

Track record of improving ML system performance through data and experimentation

Nice to have

Experience with content moderation or trust & safety

Background in speech/audio ML or computer vision

Experience with human annotation pipelines (Label Studio, Scale AI)

Familiarity with prompt engineering techniques and LLM‑based evaluation

Location: NYC only, On‑site (flexible on WFH but we like to be in office the majority of the week) in our Brooklyn based office, close to most major train lines.

Salary Range: $150k - $250k base + equity and benefits

#J-18808-Ljbffr