Jobs via Dice

Remote - Data Tester (Databricks, PySpark, and Big Data) Position

Jobs via Dice, Frankfort, Kentucky, United States

Remote - Data Tester (Databricks, PySpark, and Big Data) Position Be among the first 25 applicants.

Dice is the leading career destination for tech experts at every stage of their careers. Our client, Technogen, Inc., is seeking a talented Data Tester to join their team.

Company:

Technogen, Inc. – Proven Leader in providing full IT Services, Software Development and Solutions for 15 years, small & woman‑owned minority business with GSA Advantage Certification. Offices in VA; MD & Offshore development centers in India.

Location:

Remote

Duration:

12+ Months (Long‑Term Contract)

Job Description

We are seeking an experienced Data Tester with strong expertise in Databricks, PySpark, and Big Data ecosystems.

The ideal candidate will have a solid background in testing data pipelines, ETL workflows, and analytical data models, ensuring data integrity, accuracy, and performance across large‑scale distributed systems.

This role requires hands‑on experience with Databricks, Spark‑based data processing, and strong SQL validation skills, along with familiarity in data lake / Delta Lake testing, automation, and cloud environments (AWS, Azure, or Google Cloud Platform).

Required Qualifications

8+ years of overall experience in data testing / QA within large‑scale enterprise data environments.

5+ years of experience in testing ETL / Big Data pipelines, validating data transformations, and ensuring data integrity.

4+ years of hands‑on experience with Databricks, including notebook execution, job scheduling, and workspace management.

4+ years of experience in PySpark (DataFrame APIs, UDFs, transformations, joins, and data validation logic).

5+ years of strong proficiency in SQL (joins, aggregations, window functions, and analytical queries) for validating complex datasets.

3+ years of experience with Delta Lake or data lake testing (schema evolution, ACID transactions, time travel, partition validation).

3+ years of experience with Python scripting for automation and data validation tasks.

3+ years of experience with cloud‑based data platforms (Azure Data Lake, AWS S3, or Google Cloud Platform BigQuery).

2+ years of experience in test automation for data pipelines using tools like pytest, PySpark test frameworks, or custom Python utilities.

4+ years of strong understanding of data warehousing concepts, data modeling (Star/Snowflake), and data quality frameworks.

4+ years of experience with Agile / SAFe methodologies, including story‑based QA and sprint deliverables.

6+ years of experience in analytical and debugging skills for identifying data mismatches, performance issues, and pipeline failures.

Preferred Qualifications

Experience with CI/CD for Databricks or data testing (GitHub Actions, Jenkins, Azure DevOps).

Exposure to BI validation (Power BI, Tableau, Looker) for verifying downstream reports.

Knowledge of REST APIs for metadata validation or system integration testing.

Familiarity with big data tools like Hive, Spark SQL, Snowflake, and Airflow.

Cloud certifications (e.g., Microsoft Azure Data Engineer Associate or AWS Big Data Specialty) are a plus.

Seniority level

Mid‑Senior level

Employment type

Full‑time

Job function

Engineering and Information Technology

Industries

Software Development

Best Regards, Mohammad Kashif Ali

#J-18808-Ljbffr