Logo
Abaka AI

Data Engineer

Abaka AI, Palo Alto, California, United States, 94306

Save Job

About Abaka AI Abaka AI is built on one mission: to be the world’s most trusted data partner for AI companies. More than 1,000 industry leaders across Generative AI, Embodied AI, and Automotive AI rely on us to power their data pipelines. With our headquarters in Silicon Valley—and teams in Paris, Singapore, and Tokyo—we support global partners with fast, reliable, and scalable data solutions.

Our offerings include a diverse catalog of off-the-shelf datasets (image, video, multimodal, reasoning, 3D, and beyond) as well as comprehensive data collection and annotation services. Whether teams need raw data, curated datasets, or full-cycle data engineering, Abaka AI provides the foundation for building high-performance AI systems.

About The Role We’re hiring our first Data Engineer in the United States, a foundational role that will shape Abaka AI’s data engineering standards, systems, and culture from day one. This is an opportunity to take full ownership of how multimodal data is sourced, processed, cleaned, annotated, and delivered to some of the world’s most advanced AI teams.

You won’t just be building pipelines—you’ll be developing the infrastructure that powers frontier AI models. You’ll partner directly with foundation model teams to understand their data needs, translate them into scalable workflows, and deliver high‑quality multimodal datasets that meaningfully impact model performance.

As an early member of our engineering team, you’ll influence everything from our long‑term roadmap to our internal tooling ecosystem. If you thrive in high‑ownership environments and want to shape the machine learning foundation of a fast‑moving AI company, this role offers an opportunity to make an immediate and lasting impact.

Responsibilities

Work closely with foundation model clients to understand their data requirements, and coordinate internal teams to create tailored delivery plans that ensure on‑time, high‑quality data delivery, including meeting expectations for format, precision, and volume.

Lead the development of mid‑ to long‑term plans for the data engineering function. Build scalable, end‑to‑end pipelines for multimodal data (text, image, audio, video, 3D point cloud, etc.) across data sourcing, cleaning, annotation, QA, storage, and iterative optimisation for training, fine‑tuning, and evaluation.

Develop solutions to core technical challenges in multimodal data processing, including cross‑modal alignment (for example, image‑text semantic matching), large‑scale data cleaning (deduplication, denoising, format normalisation), annotation efficiency, and data encryption and security.

Partner with algorithm, product, and business teams by providing feedback on data bottlenecks, refining internal tooling and services, and supporting client‑facing teams with technical documentation and pre‑sales materials.

Evaluate and optimise the cost structure of data processing operations, including headcount, infrastructure and tooling, to balance quality, efficiency, and scalability.

Qualifications

Strong background in computer science, data engineering, artificial intelligence, or related fields, with hands‑on experience building or operating large‑scale data systems.

1+ years of experience in data engineering or data operations. Leadership experience is highly valued, and experience with LLM or multimodal dataset preparation is a strong plus.

Deep understanding of end‑to‑end multimodal data workflows, with hands‑on experience in at least two modalities (text, images, audio, or video).

Proficiency in designing technical architectures for large‑scale data pipelines, including distributed processing and automation frameworks, along with familiarity with data privacy and security best practices such as access control and data anonymisation.

Strong execution and team management capabilities, with the ability to translate high‑level objectives into actionable plans and drive team results.

Excellent communication and cross‑functional collaboration skills, with the ability to clearly articulate technical and operational requirements, resolve conflicts, and manage stakeholder expectations.

High sense of ownership and resilience, with comfort working in a fast‑paced, rapidly evolving AI environment and the ability to manage urgent delivery timelines.

Compensation & Benefits The base salary range for this position is $150,000 - $225,000 USD annually.

Compensation may vary outside of this range depending on a number of factors, including a candidate’s qualifications, skills, competencies and experience. Base pay is one part of the Total Package that is provided to compensate and recognise employees for their work at Abaka AI. This role is eligible for equity, as well as a comprehensive benefits package (health, dental, vision, PTO, flexible work schedule).

Seniority level Entry level

Employment type Full‑time

Job function Information Technology

Industries IT Services and IT Consulting

#J-18808-Ljbffr