Vast.ai
Data Engineer — Analytics Infrastructure (Foundational Hire)
Vast.ai, Los Angeles, California, United States, 90079
About Us
Vision: To make life substrate independent through Vast Artificial Intelligence Mission: To organize, optimize, and orient the world’s computation Vast.ai’s cloud powers AI projects and businesses all over the world. We are democratizing and decentralizing AI computing—reshaping our future for the benefit of humanity.
Location On‑site at our office in Westwood, Los Angeles
Type Full‑time
About the Role This is a foundational role: you’ll own the 0→1 build of our data platform—ingestion, modeling, governance, and self‑serve analytics in QuickSight—for Marketing, Sales, Accounting, and leadership.
We’re hiring a Data Engineer to build and own the end‑to‑end data platform at Vast.ai.
This is a hands‑on role for a builder who can move fast: designing schemas, implementing ELT/ETL, hardening data quality, and enabling secure, governed access to data across the company.
What You’ll Do
Own the data pipeline: design, build, and operate batch/streaming ingestion from product, billing, CRM, support, and marketing/ad platforms into a central warehouse.
Model the data: create clean, well‑documented staging and business marts (dimensional/star schemas) that map to the needs of Marketing, Sales, Accounting/Finance, and Operations.
Enable: publish certified datasets with row‑/column‑level security, manage refresh SLAs, and make it easy for teams to self‑serve.
Collaborate cross‑functionally: intake requirements, translate them into data contracts and models, and partner with Engineering on event/telemetry capture.
Document & scale: maintain clear docs, lineage, and a pragmatic data catalog so others can discover and trust the data.
Tech Stack Our current environment includes PostgreSQL, Python, SQL, and QuickSight. You’ll lead the next step‑function in maturity using a pragmatic, AWS‑centric stack such as:
AWS: S3, Glue/Athena or Redshift, Lambda/Step Functions, IAM/KMS
Orchestration & Modeling: Airflow or Dagster; dbt (or equivalent SQL modeling)
Data Quality & Observability: built‑in checks or tools like Great Expectations
Source Connectivity: APIs/webhooks; optionally Airbyte/Fivetran for managed connectors
Versioning/Infra: Git/GitHub Actions; Terraform (nice to have)
Marketing attribution: Segment io, Posthog, others
(We’re flexible on exact tools—strong fundamentals matter most.)
Qualifications (Must‑have)
3 years (typically 3–6) in a Data Engineering role building production ELT/ETL on a cloud platform (AWS strongly preferred).
Expert SQL and solid Python for data processing/automation.
Proven experience designing data models (staging, marts, star schemas) and standing up a warehouse/lakehouse.
Orchestration, scheduling, and operational ownership (SLAs, alerting, runbooks).
Experience enabling a BI layer (ideally QuickSight) with secure, governed datasets.
Strong collaboration and communication; able to gather requirements from non‑technical stakeholders and translate to data contracts.
Nice‑to‑have
Marketing/Sales/RevOps data (CRM, ads, attribution), Accounting/Finance integrations, or product telemetry/event pipelines.
Stream processing (Kafka/Kinesis), CDC, or near‑real‑time ingestion.
90‑Day Outcomes
Inventory & architecture: clear map of sources, proposed target architecture, and a prioritized backlog aligned with Ops/Engineering.
First pipelines live: automated ingestion core staging tables with data quality checks and alerts.
Business marts: at least two curated domains live (e.g., Marketing & Sales) powering certified QuickSight datasets for stakeholders.
Runbook & docs: onboarding‑ready documentation, lineage, and incident playbooks.
Interview Process
15 min — Initial screening (virtual)
45 min — Architecture deep‑dive into our data environment and target platform (virtual)
2 hours — On‑site practical: build/modify a small ETL modeling exercise; discuss trade‑offs, quality, and ops
Annual Salary Range $140,000 — $190,000 equity benefits
Benefits
Comprehensive health, dental, vision, and life insurance
401(k) with company match
Meaningful early‑stage equity
Onsite meals, snacks, and close collaboration with founders/tech leaders
Ambitious, fast‑paced startup culture where initiative is rewarded
#J-18808-Ljbffr
Location On‑site at our office in Westwood, Los Angeles
Type Full‑time
About the Role This is a foundational role: you’ll own the 0→1 build of our data platform—ingestion, modeling, governance, and self‑serve analytics in QuickSight—for Marketing, Sales, Accounting, and leadership.
We’re hiring a Data Engineer to build and own the end‑to‑end data platform at Vast.ai.
This is a hands‑on role for a builder who can move fast: designing schemas, implementing ELT/ETL, hardening data quality, and enabling secure, governed access to data across the company.
What You’ll Do
Own the data pipeline: design, build, and operate batch/streaming ingestion from product, billing, CRM, support, and marketing/ad platforms into a central warehouse.
Model the data: create clean, well‑documented staging and business marts (dimensional/star schemas) that map to the needs of Marketing, Sales, Accounting/Finance, and Operations.
Enable: publish certified datasets with row‑/column‑level security, manage refresh SLAs, and make it easy for teams to self‑serve.
Collaborate cross‑functionally: intake requirements, translate them into data contracts and models, and partner with Engineering on event/telemetry capture.
Document & scale: maintain clear docs, lineage, and a pragmatic data catalog so others can discover and trust the data.
Tech Stack Our current environment includes PostgreSQL, Python, SQL, and QuickSight. You’ll lead the next step‑function in maturity using a pragmatic, AWS‑centric stack such as:
AWS: S3, Glue/Athena or Redshift, Lambda/Step Functions, IAM/KMS
Orchestration & Modeling: Airflow or Dagster; dbt (or equivalent SQL modeling)
Data Quality & Observability: built‑in checks or tools like Great Expectations
Source Connectivity: APIs/webhooks; optionally Airbyte/Fivetran for managed connectors
Versioning/Infra: Git/GitHub Actions; Terraform (nice to have)
Marketing attribution: Segment io, Posthog, others
(We’re flexible on exact tools—strong fundamentals matter most.)
Qualifications (Must‑have)
3 years (typically 3–6) in a Data Engineering role building production ELT/ETL on a cloud platform (AWS strongly preferred).
Expert SQL and solid Python for data processing/automation.
Proven experience designing data models (staging, marts, star schemas) and standing up a warehouse/lakehouse.
Orchestration, scheduling, and operational ownership (SLAs, alerting, runbooks).
Experience enabling a BI layer (ideally QuickSight) with secure, governed datasets.
Strong collaboration and communication; able to gather requirements from non‑technical stakeholders and translate to data contracts.
Nice‑to‑have
Marketing/Sales/RevOps data (CRM, ads, attribution), Accounting/Finance integrations, or product telemetry/event pipelines.
Stream processing (Kafka/Kinesis), CDC, or near‑real‑time ingestion.
90‑Day Outcomes
Inventory & architecture: clear map of sources, proposed target architecture, and a prioritized backlog aligned with Ops/Engineering.
First pipelines live: automated ingestion core staging tables with data quality checks and alerts.
Business marts: at least two curated domains live (e.g., Marketing & Sales) powering certified QuickSight datasets for stakeholders.
Runbook & docs: onboarding‑ready documentation, lineage, and incident playbooks.
Interview Process
15 min — Initial screening (virtual)
45 min — Architecture deep‑dive into our data environment and target platform (virtual)
2 hours — On‑site practical: build/modify a small ETL modeling exercise; discuss trade‑offs, quality, and ops
Annual Salary Range $140,000 — $190,000 equity benefits
Benefits
Comprehensive health, dental, vision, and life insurance
401(k) with company match
Meaningful early‑stage equity
Onsite meals, snacks, and close collaboration with founders/tech leaders
Ambitious, fast‑paced startup culture where initiative is rewarded
#J-18808-Ljbffr