Logo
Energy Jobline ZR

Data Engineer (Pipelines, Quality, Orchestration)

Energy Jobline ZR, Dallas, Texas, United States, 75215

Save Job

Job DescriptionJob Description

About the role

You’ll build the data backbone that powers our keyword→auto-script machine. Your work

ensures reliable Semrush/Search Console ingestion, clean schemas, fast feature access, and

robust scheduling/monitoring—so models and scripts run on time, every time.

What you’ll do

● Build/own connectors: Semrush API, Google Search Console, internal logs; schedule

with Airflow/Prefect.

● Design schemas and tables for raw, curated, and feature layers (warehouse +

Postgres).

● Implement data quality checks (freshness, completeness, duplicates, ontology

mappings) with alerts.

● Stand up and tune vector infrastructure (pgvector/Pinecone) with indexing and

retention policies.

● Expose clean datasets features to ML services (privacy-aware, audit-ready).

● Optimize cost/perf (partitions, clustering, caching, job concurrency) and SLAs for

daily/weekly runs.

● Build simple observability dashboards (job health, latency, data drift signals).

● Partner with ML/NLP on retraining pipelines and with Compliance on audit

logs/versioning.

What you’ve done

● 3+ years as a Data Engineer (ETL/ELT in production).

● Strong Python + SQL; experience with Airflow/Prefect, dbt (nice-to-have).

● Worked with cloud warehouses (BigQuery/Snowflake/Redshift) and Postgres.

● Built resilient API ingestions with pagination, rate limits, retries, and backfills.

● Experience with data testing/validation (Great Expectations, dbt tests, or similar).

● Bonus: vector DB ops, GCP/AWS, event streaming (Kafka/PubSub), healthcare data

hygiene.

How we’ll measure success (first 90 days)

● Reliable daily Semrush/GSC loads with 99% on-time SLA and data quality checks.

● Curated tables powering clustering/intent models with documented lineage.

● Feature/embedding store online with 200ms p95 reads for model services.

Tech you’ll touch

Python, SQL, Airflow/Prefect, Postgres, Warehouse (BigQuery/Snowflake/Redshift), dbt

(optional), Great Expectations, Docker, Terraform (nice-to-have), pgvector/Pinecone.