Tredence Inc.
Principal Hadoop Architect
Location:
Remote / Hybrid
Focus:
Hadoop Ecosystem Optimization, Design & Code Frameworks and Design Standards
Role Objective:
We are seeking a Principal Hadoop Architect to serve as the central authority for a large‑scale Big Data ecosystem. You will define the "Golden Standards" for data ingestion, storage, and processing, ensuring our on‑premise environment is highly optimized and architecturally aligned for an eventual cloud evolution.
Core Responsibilities
Technical Design Authority
Standardization:
Define and enforce "Blueprints" for Hive schemas, Spark configurations, and Kafka topics to be used across all engineering and analyst teams.
Reference Architecture:
Maintain the official "Big Data Playbook," detailing approved design patterns for batch vs. real‑time processing.
Review Board:
Lead the Architecture & Design Review Board (ADRB) to vet new data projects, ensuring they don’t introduce technical debt or inefficient resource patterns.
Ecosystem Optimization & "Fitness"
Performance Tuning:
Identify "heavy‑hitter" queries and inefficient YARN resource allocations. Implement mandatory partitioning and bucketing standards to reduce HDFS overhead.
Storage Rationalization:
Implement tiered storage (Hot/Warm/Cold) policies. Enforce standard file formats (Parquet/Avro) to optimize compression and predicate push‑down.
Lifecycle Management:
Establish data retention and archival standards to prevent "Data Swamp" growth, ensuring we only store what provides value.
Cloud‑Ready Engineering (The "Clean Room" Approach)
Decoupling Strategy:
Lead the effort to decouple storage (HDFS) from compute (YARN) through architectural standards, making future cloud migration a "plug‑and‑play" exercise.
API‑First Standards:
Encourage the use of abstraction layers and APIs so that downstream applications aren’t hard‑coded to specific Hadoop versions.
Containerization Strategy:
Provide guidance on moving localized workloads toward Kubernetes/Docker-friendly designs.
Security & Multi‑Tenancy
Multi‑Tenant Governance:
Design a robust "Quotas and Queues" system to ensure a single team’s rogue Spark job doesn’t crash the cluster for everyone else.
Unified Security:
Standardize Apache Ranger policies and Kerberos implementation across all nodes.
Technical Requirements
Expert Level Hadoop:
Mastery of the Cloudera/Hortonworks stack, specifically Hive LLAP, YARN, and HDFS.
Standardization Experience:
Proven track record of creating enterprise design standards used by multiple engineering teams.
Processing Frameworks:
Deep knowledge of Spark (Core/SQL) optimization and Kafka event‑driven architecture.
Tooling Mastery:
Experience with Apache foundation services such as Apache Atlas for lineage and Apache Ranger for centralized security.
Soft Skills:
Ability to influence senior leadership and guide diverse engineering teams without direct reporting authority.
Seniority Level:
Mid‑Senior level
Employment Type:
Full‑time
Job Function:
Consulting
Industries:
Business Consulting and Services
Benefits
Medical insurance
Vision insurance
401(k)
#J-18808-Ljbffr
Remote / Hybrid
Focus:
Hadoop Ecosystem Optimization, Design & Code Frameworks and Design Standards
Role Objective:
We are seeking a Principal Hadoop Architect to serve as the central authority for a large‑scale Big Data ecosystem. You will define the "Golden Standards" for data ingestion, storage, and processing, ensuring our on‑premise environment is highly optimized and architecturally aligned for an eventual cloud evolution.
Core Responsibilities
Technical Design Authority
Standardization:
Define and enforce "Blueprints" for Hive schemas, Spark configurations, and Kafka topics to be used across all engineering and analyst teams.
Reference Architecture:
Maintain the official "Big Data Playbook," detailing approved design patterns for batch vs. real‑time processing.
Review Board:
Lead the Architecture & Design Review Board (ADRB) to vet new data projects, ensuring they don’t introduce technical debt or inefficient resource patterns.
Ecosystem Optimization & "Fitness"
Performance Tuning:
Identify "heavy‑hitter" queries and inefficient YARN resource allocations. Implement mandatory partitioning and bucketing standards to reduce HDFS overhead.
Storage Rationalization:
Implement tiered storage (Hot/Warm/Cold) policies. Enforce standard file formats (Parquet/Avro) to optimize compression and predicate push‑down.
Lifecycle Management:
Establish data retention and archival standards to prevent "Data Swamp" growth, ensuring we only store what provides value.
Cloud‑Ready Engineering (The "Clean Room" Approach)
Decoupling Strategy:
Lead the effort to decouple storage (HDFS) from compute (YARN) through architectural standards, making future cloud migration a "plug‑and‑play" exercise.
API‑First Standards:
Encourage the use of abstraction layers and APIs so that downstream applications aren’t hard‑coded to specific Hadoop versions.
Containerization Strategy:
Provide guidance on moving localized workloads toward Kubernetes/Docker-friendly designs.
Security & Multi‑Tenancy
Multi‑Tenant Governance:
Design a robust "Quotas and Queues" system to ensure a single team’s rogue Spark job doesn’t crash the cluster for everyone else.
Unified Security:
Standardize Apache Ranger policies and Kerberos implementation across all nodes.
Technical Requirements
Expert Level Hadoop:
Mastery of the Cloudera/Hortonworks stack, specifically Hive LLAP, YARN, and HDFS.
Standardization Experience:
Proven track record of creating enterprise design standards used by multiple engineering teams.
Processing Frameworks:
Deep knowledge of Spark (Core/SQL) optimization and Kafka event‑driven architecture.
Tooling Mastery:
Experience with Apache foundation services such as Apache Atlas for lineage and Apache Ranger for centralized security.
Soft Skills:
Ability to influence senior leadership and guide diverse engineering teams without direct reporting authority.
Seniority Level:
Mid‑Senior level
Employment Type:
Full‑time
Job Function:
Consulting
Industries:
Business Consulting and Services
Benefits
Medical insurance
Vision insurance
401(k)
#J-18808-Ljbffr