Energy Jobline ZR
Data Engineer (Pipelines, Quality, Orchestration)
Energy Jobline ZR, Dallas, Texas, United States, 75215
Job DescriptionJob Description
About the role
You’ll build the data backbone that powers our keyword→auto-script machine. Your work
ensures reliable Semrush/Search Console ingestion, clean schemas, fast feature access, and
robust scheduling/monitoring—so models and scripts run on time, every time.
What you’ll do
● Build/own connectors: Semrush API, Google Search Console, internal logs; schedule
with Airflow/Prefect.
● Design schemas and tables for raw, curated, and feature layers (warehouse +
Postgres).
● Implement data quality checks (freshness, completeness, duplicates, ontology
mappings) with alerts.
● Stand up and tune vector infrastructure (pgvector/Pinecone) with indexing and
retention policies.
● Expose clean datasets features to ML services (privacy-aware, audit-ready).
● Optimize cost/perf (partitions, clustering, caching, job concurrency) and SLAs for
daily/weekly runs.
● Build simple observability dashboards (job health, latency, data drift signals).
● Partner with ML/NLP on retraining pipelines and with Compliance on audit
logs/versioning.
What you’ve done
● 3+ years as a Data Engineer (ETL/ELT in production).
● Strong Python + SQL; experience with Airflow/Prefect, dbt (nice-to-have).
● Worked with cloud warehouses (BigQuery/Snowflake/Redshift) and Postgres.
● Built resilient API ingestions with pagination, rate limits, retries, and backfills.
● Experience with data testing/validation (Great Expectations, dbt tests, or similar).
● Bonus: vector DB ops, GCP/AWS, event streaming (Kafka/PubSub), healthcare data
hygiene.
How we’ll measure success (first 90 days)
● Reliable daily Semrush/GSC loads with 99% on-time SLA and data quality checks.
● Curated tables powering clustering/intent models with documented lineage.
● Feature/embedding store online with 200ms p95 reads for model services.
Tech you’ll touch
Python, SQL, Airflow/Prefect, Postgres, Warehouse (BigQuery/Snowflake/Redshift), dbt
(optional), Great Expectations, Docker, Terraform (nice-to-have), pgvector/Pinecone.
About the role
You’ll build the data backbone that powers our keyword→auto-script machine. Your work
ensures reliable Semrush/Search Console ingestion, clean schemas, fast feature access, and
robust scheduling/monitoring—so models and scripts run on time, every time.
What you’ll do
● Build/own connectors: Semrush API, Google Search Console, internal logs; schedule
with Airflow/Prefect.
● Design schemas and tables for raw, curated, and feature layers (warehouse +
Postgres).
● Implement data quality checks (freshness, completeness, duplicates, ontology
mappings) with alerts.
● Stand up and tune vector infrastructure (pgvector/Pinecone) with indexing and
retention policies.
● Expose clean datasets features to ML services (privacy-aware, audit-ready).
● Optimize cost/perf (partitions, clustering, caching, job concurrency) and SLAs for
daily/weekly runs.
● Build simple observability dashboards (job health, latency, data drift signals).
● Partner with ML/NLP on retraining pipelines and with Compliance on audit
logs/versioning.
What you’ve done
● 3+ years as a Data Engineer (ETL/ELT in production).
● Strong Python + SQL; experience with Airflow/Prefect, dbt (nice-to-have).
● Worked with cloud warehouses (BigQuery/Snowflake/Redshift) and Postgres.
● Built resilient API ingestions with pagination, rate limits, retries, and backfills.
● Experience with data testing/validation (Great Expectations, dbt tests, or similar).
● Bonus: vector DB ops, GCP/AWS, event streaming (Kafka/PubSub), healthcare data
hygiene.
How we’ll measure success (first 90 days)
● Reliable daily Semrush/GSC loads with 99% on-time SLA and data quality checks.
● Curated tables powering clustering/intent models with documented lineage.
● Feature/embedding store online with 200ms p95 reads for model services.
Tech you’ll touch
Python, SQL, Airflow/Prefect, Postgres, Warehouse (BigQuery/Snowflake/Redshift), dbt
(optional), Great Expectations, Docker, Terraform (nice-to-have), pgvector/Pinecone.