Vast

Data Engineer - Analytics Infrastructure (Foundational Hire)

Vast, Los Angeles, California, United States, 90079

About Us

Vision: To make life substrate independent through Vast Artificial Intelligence

Mission: To organize, optimize, and orient the world's computation

Vast.ai's cloud powers AI projects and businesses all over the world. We are democratizing and decentralizing AI computing-reshaping our future for the benefit of humanity.

We are a growing and highly motivated team dedicated to an ambitious technical plan. Our structure is flat, our ambitions are out-sized, and leadership is earned by shipping excellence.

We seek a data engineer with strong intrinsic drive, a true passion for uncovering insights from data, and a mix of analytical, programming, and communication skills.

LOCATION:

On-site at our office in Westwood, Los Angeles TYPE:

Full-time • On-site • Immediate start preferred REPORTS TO:

Operations (partnering closely with Engineering) About the Role

This is a foundational role: you'll own the 0→1 build of our data platform-ingestion, modeling, governance, and self-serve analytics in QuickSight-for Marketing, Sales, Accounting, and leadership. We're hiring a Data Engineer to build and own the end-to-end data platform at Vast.ai.

This is a hands-on role for a builder who can move fast: designing schemas, implementing ELT/ETL, hardening data quality, and enabling secure, governed access to data across the company.

Full-time

On-site

at our LA office

What You'll Do Own the data pipeline : design, build, and operate batch/streaming ingestion from product, billing, CRM, support, and marketing/ad platforms into a central warehouse. Model the data : create clean, well-documented staging and

business marts

(dimensional/star schemas) that map to the needs of Marketing, Sales, Accounting/Finance, and Operations. Enable : publish certified datasets with row-/column-level security, manage refresh SLAs, and make it easy for teams to self-serve. Collaborate cross-functionally : intake requirements, translate them into data contracts and models, and partner with Engineering on event/telemetry capture. Document & scale : maintain clear docs, lineage, and a pragmatic data catalog so others can discover and trust the data. Tech Stack

Our current environment includes PostgreSQL, Python, SQL, and QuickSight. You'll lead the next step-function in maturity using a pragmatic, AWS-centric stack such as:

AWS : S3, Glue/Athena or Redshift, Lambda/Step Functions, IAM/KMS Orchestration & Modeling : Airflow or Dagster; dbt (or equivalent SQL modeling) Data Quality & Observability : built-in checks or tools like Great Expectations Source Connectivity : APIs/webhooks; optionally Airbyte/Fivetran for managed connectors Versioning/Infra : Git/GitHub Actions; Terraform (nice to have) Marketing attribution : Segment io, Posthog, others (We're flexible on exact tools-strong fundamentals matter most.)

Qualifications

Must-have

3+ years (typically 3-6) in a Data Engineering role building production ELT/ETL on a cloud platform (AWS strongly preferred). Expert SQL and solid Python for data processing/automation. Proven experience designing data models (staging, marts, star schemas) and standing up a warehouse/lakehouse. Orchestration, scheduling, and operational ownership (SLAs, alerting, runbooks). Experience enabling a BI layer (ideally QuickSight) with secure, governed datasets. Strong collaboration and communication; able to gather requirements from non-technical stakeholders and translate to data contracts. Nice-to-have

Marketing/Sales/RevOps data (CRM, ads, attribution), Accounting/Finance integrations, or product telemetry/event pipelines. Stream processing (Kafka/Kinesis), CDC, or near-real-time ingestion. Data privacy/security best practices (e.g., CPRA), partitioning/performance tuning, and cost management on AWS. 90-Day Outcomes Inventory & architecture : clear map of sources, proposed target architecture, and a prioritized backlog aligned with Ops/Engineering. First pipelines live : automated ingestion + core staging tables with data quality checks and alerts. Business marts : at least two curated domains live (e.g., Marketing & Sales) powering certified QuickSight datasets for stakeholders. Runbook & docs : onboarding-ready documentation, lineage, and incident playbooks. Interview Process ( 1 week) 15 min

- Initial screening (virtual) 45 min

- Architecture deep-dive into our data environment and target platform (virtual) 2 hours

- On-site practical: build/modify a small ETL + modeling exercise; discuss trade-offs, quality, and ops

Annual Salary Range

$140,000 - $190,000 + equity + benefits Benefits Comprehensive health, dental, vision, and life insurance 401(k) with company match Meaningful early-stage equity Onsite meals, snacks, and close collaboration with founders/tech leaders Ambitious, fast-paced startup culture where initiative is rewarded