Purple Drive
POSITION SUMMARY
We are seeking an experienced Data Engineer specializing in Databricks platform for a 6-month contract position based in Seattle, WA. This role focuses on designing, building, and deploying robust data pipelines and ETL processes within a cloud data platform environment. The ideal candidate will have strong expertise in Databricks, data modeling, and cross-functional collaboration to support enterprise data governance initiatives.
RESPONSIBILITIES
DATA PIPELINE DEVELOPMENT
Design, build, and deploy data extraction, transformation, and loading (ETL) processes and pipelines Extract data from various sources including databases, APIs, and data files Develop and maintain scalable data pipelines within Databricks Cloud Data Platform Ensure data quality, reliability, and performance across all pipeline processes Implement data validation and monitoring mechanisms DATABRICKS PLATFORM MANAGEMENT
Build comprehensive data models that reflect domain expertise and meet current business needs Ensure data models remain flexible and adaptable as business strategy evolves Monitor and optimize Databricks cluster performance for cost-effective scaling and resource utilization Implement and maintain Delta Lake for optimized data storage, ensuring data reliability, performance, and versioning Leverage Databricks Unity Catalog for data governance and collaboration DEVOPS & AUTOMATION
Automate CI/CD pipelines for data workflows using Azure DevOps Implement version control and deployment strategies for data pipelines Ensure automated testing and quality assurance for data processes Maintain infrastructure as code practices for data platform components DATA ARCHITECTURE & STORAGE
Demonstrate expertise in database storage concepts including data lakes, relational databases, NoSQL, Graph databases, and data warehousing Design and implement efficient data storage solutions Optimize data access patterns and query performance Ensure proper data partitioning and indexing strategies COLLABORATION & COMMUNICATION
Collaborate with cross-functional teams to support data governance initiatives Communicate technical concepts effectively to both technical and non-technical audiences Work with business stakeholders to understand data requirements Provide documentation and knowledge transfer for data solutions QUALIFICATIONS
REQUIRED EXPERIENCE
5+ years
of experience in data engineering and ETL development 3+ years
of hands-on experience with Databricks platform Strong experience with cloud data platforms (Azure, AWS, or GCP) Proven track record in building and maintaining data pipelines at scale TECHNICAL SKILLS
Databricks Expertise:
Databricks workspace and cluster management Delta Lake implementation and optimization Databricks Unity Catalog Spark and PySpark programming
Programming & Development:
Strong coding skills in Python, Scala, or SQL Experience with data transformation and data quality frameworks Knowledge of data integration patterns and best practices
Database & Storage:
Data lake architecture and design Relational databases (SQL Server, PostgreSQL, etc.) NoSQL databases (MongoDB, Cassandra, etc.) Graph databases and data warehousing concepts
DevOps & Automation:
Azure DevOps or similar CI/CD tools Infrastructure as Code (Terraform, ARM templates) Git version control and branching strategies
SOFT SKILLS
Strong analytical and problem-solving abilities Excellent written and verbal communication skills Ability to explain complex technical concepts to diverse audiences Collaborative mindset for cross-functional team environments Detail-oriented with focus on data quality and reliability PREFERRED QUALIFICATIONS
Databricks certification (Data Engineer Associate/Professional) Experience with real-time data processing and streaming Knowledge of data governance frameworks and best practices Experience with data visualization tools (Power BI, Tableau) Background in Agile/Scrum development methodologies
We are seeking an experienced Data Engineer specializing in Databricks platform for a 6-month contract position based in Seattle, WA. This role focuses on designing, building, and deploying robust data pipelines and ETL processes within a cloud data platform environment. The ideal candidate will have strong expertise in Databricks, data modeling, and cross-functional collaboration to support enterprise data governance initiatives.
RESPONSIBILITIES
DATA PIPELINE DEVELOPMENT
Design, build, and deploy data extraction, transformation, and loading (ETL) processes and pipelines Extract data from various sources including databases, APIs, and data files Develop and maintain scalable data pipelines within Databricks Cloud Data Platform Ensure data quality, reliability, and performance across all pipeline processes Implement data validation and monitoring mechanisms DATABRICKS PLATFORM MANAGEMENT
Build comprehensive data models that reflect domain expertise and meet current business needs Ensure data models remain flexible and adaptable as business strategy evolves Monitor and optimize Databricks cluster performance for cost-effective scaling and resource utilization Implement and maintain Delta Lake for optimized data storage, ensuring data reliability, performance, and versioning Leverage Databricks Unity Catalog for data governance and collaboration DEVOPS & AUTOMATION
Automate CI/CD pipelines for data workflows using Azure DevOps Implement version control and deployment strategies for data pipelines Ensure automated testing and quality assurance for data processes Maintain infrastructure as code practices for data platform components DATA ARCHITECTURE & STORAGE
Demonstrate expertise in database storage concepts including data lakes, relational databases, NoSQL, Graph databases, and data warehousing Design and implement efficient data storage solutions Optimize data access patterns and query performance Ensure proper data partitioning and indexing strategies COLLABORATION & COMMUNICATION
Collaborate with cross-functional teams to support data governance initiatives Communicate technical concepts effectively to both technical and non-technical audiences Work with business stakeholders to understand data requirements Provide documentation and knowledge transfer for data solutions QUALIFICATIONS
REQUIRED EXPERIENCE
5+ years
of experience in data engineering and ETL development 3+ years
of hands-on experience with Databricks platform Strong experience with cloud data platforms (Azure, AWS, or GCP) Proven track record in building and maintaining data pipelines at scale TECHNICAL SKILLS
Databricks Expertise:
Databricks workspace and cluster management Delta Lake implementation and optimization Databricks Unity Catalog Spark and PySpark programming
Programming & Development:
Strong coding skills in Python, Scala, or SQL Experience with data transformation and data quality frameworks Knowledge of data integration patterns and best practices
Database & Storage:
Data lake architecture and design Relational databases (SQL Server, PostgreSQL, etc.) NoSQL databases (MongoDB, Cassandra, etc.) Graph databases and data warehousing concepts
DevOps & Automation:
Azure DevOps or similar CI/CD tools Infrastructure as Code (Terraform, ARM templates) Git version control and branching strategies
SOFT SKILLS
Strong analytical and problem-solving abilities Excellent written and verbal communication skills Ability to explain complex technical concepts to diverse audiences Collaborative mindset for cross-functional team environments Detail-oriented with focus on data quality and reliability PREFERRED QUALIFICATIONS
Databricks certification (Data Engineer Associate/Professional) Experience with real-time data processing and streaming Knowledge of data governance frameworks and best practices Experience with data visualization tools (Power BI, Tableau) Background in Agile/Scrum development methodologies