Anblicks

Lead DataBricks Engineer

Anblicks, Dallas, Texas, United States, 75215

Job Summary:

Find out more about the daily tasks, overall responsibilities, and required experience for this opportunity by scrolling down now. As a Databricks Lead, you will be a critical member of our data engineering team, responsible for designing, developing, and optimizing our data pipelines and platforms on Databricks, primarily leveraging AWS services. You will play a key role in implementing robust data governance with Unity Catalog and ensuring cost-effective data solutions. This role requires a strong technical leader who can mentor junior engineers, drive best practices, and contribute hands-on to complex data challenges. Responsibilities: * Databricks Platform Leadership: * Lead the design, development, and deployment of large-scale data solutions on the Databricks platform. * Establish and enforce best practices for Databricks usage, including notebook development, job orchestration, and cluster management. * Stay abreast of the latest Databricks features and capabilities, recommending and implementing improvements. * Data Ingestion and Streaming (Kafka): * Architect and implement real-time and batch data ingestion pipelines using Apache Kafka for high-volume data streams. * Integrate Kafka with Databricks for seamless data processing and analysis. * Optimize Kafka consumers and producers for performance and reliability. * Data Governance and Management (Unity Catalog): * Implement and manage data governance policies and access controls using Databricks Unity Catalog. * Define and enforce data cataloging, lineage, and security standards within the Databricks Lakehouse. * Collaborate with data governance teams to ensure compliance and data quality. * AWS Cloud Integration: * Leverage various AWS services (S3, EC2, Lambda, Glue, etc.) to build a robust and scalable data infrastructure. * Manage and optimize AWS resources for Databricks workloads. * Ensure secure and compliant integration between Databricks and AWS. * Cost Optimization: * Proactively identify and implement strategies for cost optimization across Databricks and AWS resources. * Monitor DBU consumption, cluster utilization, and storage costs, providing recommendations for efficiency gains. * Implement autoscaling, auto-termination, and right-sizing strategies to minimize operational expenses. * Technical Leadership & Mentoring: * Provide technical guidance and mentorship to a team of data engineers. * Conduct code reviews, promote coding standards, and foster a culture of continuous improvement. * Lead technical discussions and decision-making for complex data engineering problems. * Data Pipeline Development & Optimization: * Develop, test, and maintain robust and efficient ETL/ELT pipelines using PySpark/Spark SQL. * Optimize Spark jobs for performance, scalability, and resource utilization. * Troubleshoot and resolve complex data pipeline issues. * Collaboration: * Work closely with data scientists, analysts, and other engineering teams to understand data requirements and deliver solutions. * Communicate technical concepts effectively to both technical and non-technical stakeholders. Qualifications: * Bachelor's or Master's degree in Computer Science, Data Engineering, or a related quantitative field. * 7+ years of experience in data engineering, with at least 3+ years in a lead or senior role. * Proven expertise in designing and implementing data solutions on Databricks. * Strong hands-on experience with Apache Kafka for real-time data streaming. * In-depth knowledge and practical experience with Databricks Unity Catalog for data governance and access control. * Solid understanding of AWS cloud services and their application in data architectures (S3, EC2, Lambda, VPC, IAM, etc.). * Demonstrated ability to optimize cloud resource usage and implement cost-saving strategies. * Proficiency in Python and Spark (PySpark/Spark SQL) for data processing and analysis. * Experience with Delta Lake and other modern data lake formats. * Excellent problem-solving, analytical, and communication skills. Added Advantage (Bonus Skills): * Experience with Apache Flink for stream processing. * Databricks certifications. * Experience with CI/CD pipelines for Databricks deployments. * Knowledge of other cloud platforms (Azure, GCP) is a plus.