CrackaJack Digital Solutions LLC
Pyspark Databricks Engineer
CrackaJack Digital Solutions LLC, Houston, Texas, United States, 77246
Job Title: PySpark and Databricks Developer
Location: Houston, TX (Hybrid)
Key Responsibilities:
Design, develop, and optimize data pipelines and transformations using
PySpark
and
Databricks . Collaborate with data architects and analysts to define and implement scalable data models and frameworks. Build and maintain complex data ingestion and processing workflows for large, distributed datasets. Develop reusable and efficient code following best practices in coding, testing, and deployment. Optimize Spark jobs for performance, scalability, and reliability in production environments. Work closely with cross-functional teams to ensure data quality, consistency, and integrity. Contribute to continuous improvement of the data engineering ecosystem and CI/CD processes. Required Skills and Qualifications:
5+ years of experience
in software development with a focus on
Python
and
PySpark . Hands-on expertise
in
Databricks
platform - including cluster management, notebooks, and job orchestration. Strong programming fundamentals (data structures, algorithms, debugging, version control). Experience with
Delta Lake ,
Spark SQL , and
data lake architectures . Solid understanding of distributed computing, data partitioning, and Spark performance tuning. Familiarity with
cloud platforms
such as Azure, AWS, or GCP. Excellent communication and problem-solving skills - able to explain complex technical concepts clearly.
Key Responsibilities:
Design, develop, and optimize data pipelines and transformations using
PySpark
and
Databricks . Collaborate with data architects and analysts to define and implement scalable data models and frameworks. Build and maintain complex data ingestion and processing workflows for large, distributed datasets. Develop reusable and efficient code following best practices in coding, testing, and deployment. Optimize Spark jobs for performance, scalability, and reliability in production environments. Work closely with cross-functional teams to ensure data quality, consistency, and integrity. Contribute to continuous improvement of the data engineering ecosystem and CI/CD processes. Required Skills and Qualifications:
5+ years of experience
in software development with a focus on
Python
and
PySpark . Hands-on expertise
in
Databricks
platform - including cluster management, notebooks, and job orchestration. Strong programming fundamentals (data structures, algorithms, debugging, version control). Experience with
Delta Lake ,
Spark SQL , and
data lake architectures . Solid understanding of distributed computing, data partitioning, and Spark performance tuning. Familiarity with
cloud platforms
such as Azure, AWS, or GCP. Excellent communication and problem-solving skills - able to explain complex technical concepts clearly.