Apolis

Databricks Architect

Apolis, Jersey City, New Jersey, United States, 07390

Role: Databricks Architect

Client Location - NJ End client - Biomarin

100% remote

Description

The Databricks Architect - Lead is a critical role within our Data and Analytics practice, tasked with designing, implementing, and optimizing advanced data architectures using Databricks. This role involves close collaboration with data scientists, analysts, and business stakeholders to deliver scalable, secure, and efficient data solutions that align with organizational goals. The ideal candidate will bring deep expertise in Databricks, Apache Spark, and cloud platforms, with a proven ability to lead complex data architecture initiatives.

In this role, you will architect data lakes, data warehouses, and real-time processing systems, optimize Databricks clusters for performance and cost, and develop robust ETL/ELT pipelines. You will also ensure data governance and compliance, integrate Databricks with cloud ecosystems, and stay ahead of industry trends to drive innovation. Reporting to the Director of Data Architecture, you will play a key role in shaping our data strategy and delivering business value.

Responsibilities

1.

rchitecture Design •

Design and maintain scalable data architectures using Databricks, including data lakes, data warehouses, and real-time processing systems. •

Create detailed blueprints for data processes and flows, ensuring alignment with business objectives and technical requirements.

2.

Cluster Management •

Configure, manage, and optimize Databricks clusters to achieve high performance and cost efficiency. •

Implement best practices for cluster management, including autoscaling, cluster policies, and performance tuning.

3.

Pipeline Development •

Develop and implement ETL/ELT pipelines using Databricks and Apache Spark to handle large-scale data processing. •

Ensure pipelines are efficient, scalable, and meet performance and reliability standards.

4.

Data Governance •

Enforce data governance policies, security measures, and compliance standards within the Databricks environment. •

Leverage tools like Unity Catalog to manage data governance and implement IAM roles, VPCs, and encryption protocols.

5.

Collaboration •

Partner with data scientists, analysts, and business stakeholders to understand data needs and deliver impactful solutions. •

Communicate complex technical concepts clearly to diverse audiences, fostering alignment and collaboration.

6.

Integration •

Integrate Databricks with cloud services such as AWS S3, Azure Data Lake Storage, or Google Cloud Storage to build a cohesive data ecosystem. •

Ensure seamless data flow and interoperability with existing systems and tools.

7.

Innovation •

Stay updated on the latest advancements in Databricks, Delta Lake, Databricks SQL, and related technologies. •

pply industry best practices to enhance data capabilities and drive continuous improvement.

About You (Desired Profile) •

10-15 years of experience in data architecture, with a strong focus on Databricks and Apache Spark. •

Proven expertise in designing and implementing data lakes, data warehouses, and real-time processing systems. •

Hands-on experience with Databricks cluster management, optimization, and performance tuning. •

Pharmaceutical data knowledge is required and mandatory •

Strong proficiency in developing ETL/ELT pipelines using Apache Spark. •

In-depth knowledge of data governance, security, and compliance in cloud environments. •

Exceptional communication and collaboration skills, with the ability to engage effectively with cross-functional teams. •

Extensive experience with cloud platforms (AWS, Azure, or GCP) and integrating Databricks with cloud services. •

Proficiency in programming languages such as Python, Scala, Java, SQL, or R. •

Bachelor's degree in computer science, Data Science, or a related field; an advanced degree (e.g., Master's) is a plus. •

Certifications in Databricks, AWS technologies REQUIRED.