Logo
krea.ai

Data Engineer

krea.ai, San Francisco, California, United States, 94199

Save Job

About Krea

At Krea, we are building next‑generation AI creative tools. We are dedicated to making AI intuitive and controllable for creatives. Our mission is to build tools that empower human creativity, not replace it. We believe AI is a new medium that allows us to express ourselves through various formats—text, images, video, sound, and even 3D. We are building better, smarter, and more controllable tools to harness this medium. Job Description

Data is one of the fundamental pieces of Krea. Huge amounts of data power our AI training pipelines, our analytics and observability, and many of the core systems that make Krea tick. As a data engineer, you will build distributed systems to process gigantic (petabytes) amounts of files of all kinds (images, video, and even 3D data). You should feel comfortable solving scaling problems as you go. You will work closely with our research team to build ML pipelines and deploy models to make sense of raw data. You will play with massive amounts of compute on huge Kubernetes GPU clusters – our main GPU cluster takes up an entire datacenter from our provider. You will learn machine learning engineering (ML experience is a bonus, but you can also learn it on the job) from world‑class researchers on a small yet highly effective tight‑knit team. Example Projects

Find clean scenes in millions of videos, running distributed data pipelines that detect shot boundaries and save timestamps of clips. Solve orchestration and scaling issues with a large‑scale distributed GPU job processing system on Kubernetes. Build systems to deploy and combine different LLMs to caption massive amounts of multimedia data in a variety of different ways. Design multi‑stage pipelines to turn petabytes of raw data into clean downstream datasets, with metadata, annotations, and filters. Qualifications

Python, PyArrow, DuckDB, SQL, massive relational databases, PyTorch, Pandas, NumPy. Kubernetes. Designing and implementing large‑scale ETL systems. Fundamental knowledge of containerization, operating systems, file‑systems, and networking. Distributed systems design. About Us

We’re building AI creative tooling. We’ve raised over $83M from the best investors in Silicon Valley. We’re a team of 12 with millions of active users scaling aggressively.

#J-18808-Ljbffr