Custom Software Systems Inc.
Azure Databricks Engineer
Custom Software Systems Inc., Leesburg, Virginia, United States, 22075
Custom Software Systems, Inc. (CSS) is seeking an Azure Databricks Engineer to join our team. The Azure Databricks Engineer will serve as a critical technical resource within the Financial Regulatory Agency's Data Modernization Section, supporting the Cloud Data Management and Analytics Platform. This platform is the enterprise foundation for the agency's data management, advanced analytics, and AI capabilities.
Working within the Chief Data Officer Staff organization, this role is essential to the agency’s mission of maintaining stability and confidence in the nation’s financial system through data-driven modernization. The engineer will lead the vision, design, development, and transformation of the agency's data strategy, helping to transition from siloed data management to one where enterprise data becomes a strategic resource that is securely shared and used for critical regulatory functions.
Responsibilities
Enterprise Data Platform Development & Operations
Design and build scalable ETL/ELT pipelines using Azure Databricks, Delta Lake, and Apache Spark to process over 30 terabytes of structured and unstructured financial examination data, supporting over twenty-five active business initiatives.
Architect and continuously mature the lakehouse solution, implementing advanced data modeling, efficient partitioning, and Delta Lake optimizations (e.g., designing medallion architectures with bronze, silver, and gold layers).
Configure, manage, and optimize Databricks workspaces, clusters, and compute resources, while implementing comprehensive role-based access controls, security policies, and cost optimization strategies.
Develop reusable architecture patterns and cloud technology reference models to accelerate adoption and standardization across the enterprise.
Legacy System Modernization and NLP Analytics
Lead the technical migration and modernization of a legacy on-premises Natural Language Processing (NLP) solution to Azure-native tools within the cloud platform.
Develop and optimize pipelines that extract intelligence from unstructured and semi-structured examination data, leveraging Azure AI Services and Azure Cognitive Services for advanced NLP capabilities (e.g., entity identification, relationship mapping).
Design solution patterns that can be replicated for similar ML/NLP document extraction needs across the agency.
Real-Time and Batch Processing Architecture
Develop comprehensive real-time data processing solutions using Structured Streaming to support time-sensitive financial monitoring and regulatory examination workflows, ensuring low-latency data availability and exactly-once processing semantics.
Implement high-performance batch processing jobs for data transformation, aggregation, and integration, designing and maintaining hybrid processing architectures (lambda or kappa) that seamlessly handle both streaming and batch workloads.
Machine Learning Operations and Data Science Support
Collaborate with data scientists to deploy ML models into production environments, implementing and managing MLflow for comprehensive model lifecycle management (versioning, experiment tracking, model registry).
Create automated ML pipelines for model training, validation, and inference that integrate with cloud platform services, supporting operational enterprise AI capabilities.
Infrastructure as Code and DevOps Practices
Develop and maintain Terraform Infrastructure as Code (IaC) scripts for cloud platform operational builds, ensuring reproducibility and consistency.
Implement continuous integration and deployment (CI/CD) practices using the agency’s Enterprise GitHub, and follow the agency’s Change Control Board release process.
Advanced Data Security and Governance Implementation
Implement advanced data security capabilities, including enterprise data labeling, fine-grained access control (sub-document/chunk level), and data tokenization and masking techniques.
Ensure all solutions comply with agency governance frameworks (e.g., Data Action Working Group, Enterprise Data Council) and support the agency's security office, Assess and Authorize process, and Zero Trust architecture principles.
Clearance
Must be clearable.
Citizenship US Citizenship
Required Qualifications
5+ years of progressive experience as a Data Engineer.
3+ years of specialized, hands‑on experience specifically in Azure Databricks and related Azure data ecosystem services, ideally in regulated or government environments where data security, governance, and compliance are paramount.
Core Technical Expertise
Expert‑level proficiency in Apache Spark and Delta Lake, with deep understanding of distributed data processing architectures, performance optimization techniques (e.g., Spark internals, partitioning, broadcast joins), and Delta Lake's ACID transaction capabilities.
Advanced capabilities in Python, SQL, and Scala for developing robust data engineering solutions.
Proven experience architecting and optimizing large‑scale data pipelines handling terabytes of data.
Comprehensive knowledge of both batch and real‑time ETL/ELT processes and expertise in multiple data modeling approaches (dimensional modeling, data vault, lakehouse modeling).
Azure Cloud Platform Mastery
Comprehensive, hands‑on experience with the integrated Cloud Data Management and Analytics Platform suite:
Azure Databricks (Primary data engineering platform)
Azure Data Factory (Orchestration of complex workflows)
Azure Data Lake Storage (Enterprise data repository)
Azure Synapse Analytics (Integrated analytics capabilities)
Proficiency with supporting services: Azure Machine Learning, Azure AI Services (including Azure OpenAI), Cosmos DB, App Services, Functions, and API Management.
Proficiency with Terraform for Infrastructure as Code (IaC).
Deep understanding of modern lakehouse architecture and experience implementing data fabric patterns supporting hybrid infrastructure.
Knowledge, Skills & Abilities
Exceptional analytical and problem‑solving abilities with a proven track record of resolving complex data engineering challenges.
Excellent written and verbal communication skills, with the ability to explain complex technical concepts to diverse audiences (business users, executives, technical teams).
Deep experience with Agile and Scrum methodologies and demonstrated flexibility to support multiple simultaneous initiatives.
Commitment to maintaining current knowledge in rapidly evolving Azure and Databricks technologies.
Strong understanding of security and privacy requirements in financial regulatory contexts.
Certificates
Microsoft Certified Azure Data Engineer Associate (Preferred).
Databricks Certified Data Engineer Professional or Associate (Preferred).
Education
BA/BS/MS in Computer Science, Engineering, Data Science, or equivalent professional experience.
Why Join Us? This role offers the opportunity to be instrumental in modernizing the agency’s data capabilities, directly enabling advanced analytics and AI solutions that support critical financial examination and regulatory functions. You will be shaping enterprise solution patterns that create lasting impact and will be involved in cutting‑edge initiatives, including enterprise AI model management and multi‑cloud data fabric implementations. Your work will directly contribute to the agency’s mission to maintain stability and confidence in the nation’s financial system.
Compensation & Benefits
Wage Range: Negotiable
General Benefits: Custom Software Systems, Inc. offers our employees a competitive benefits package that may include:
Health insurance plans
Health Savings Account (HSA)
Dental
Vision
Long‑term disability
Short‑term disability
Basic term life insurance
Supplemental term life insurance for employees, spouses, and dependents
Simple IRA
Parking/Commuting expense reimbursement
Training/Education
Parking/Commuting expense reimbursement
Compensation range must be coordinated with and approved by the CSS Chief Operating Officer (COO).
Compensation & Benefits information is required for all Maryland Employers effective October 1, 2024.
#J-18808-Ljbffr
Responsibilities
Enterprise Data Platform Development & Operations
Design and build scalable ETL/ELT pipelines using Azure Databricks, Delta Lake, and Apache Spark to process over 30 terabytes of structured and unstructured financial examination data, supporting over twenty-five active business initiatives.
Architect and continuously mature the lakehouse solution, implementing advanced data modeling, efficient partitioning, and Delta Lake optimizations (e.g., designing medallion architectures with bronze, silver, and gold layers).
Configure, manage, and optimize Databricks workspaces, clusters, and compute resources, while implementing comprehensive role-based access controls, security policies, and cost optimization strategies.
Develop reusable architecture patterns and cloud technology reference models to accelerate adoption and standardization across the enterprise.
Legacy System Modernization and NLP Analytics
Lead the technical migration and modernization of a legacy on-premises Natural Language Processing (NLP) solution to Azure-native tools within the cloud platform.
Develop and optimize pipelines that extract intelligence from unstructured and semi-structured examination data, leveraging Azure AI Services and Azure Cognitive Services for advanced NLP capabilities (e.g., entity identification, relationship mapping).
Design solution patterns that can be replicated for similar ML/NLP document extraction needs across the agency.
Real-Time and Batch Processing Architecture
Develop comprehensive real-time data processing solutions using Structured Streaming to support time-sensitive financial monitoring and regulatory examination workflows, ensuring low-latency data availability and exactly-once processing semantics.
Implement high-performance batch processing jobs for data transformation, aggregation, and integration, designing and maintaining hybrid processing architectures (lambda or kappa) that seamlessly handle both streaming and batch workloads.
Machine Learning Operations and Data Science Support
Collaborate with data scientists to deploy ML models into production environments, implementing and managing MLflow for comprehensive model lifecycle management (versioning, experiment tracking, model registry).
Create automated ML pipelines for model training, validation, and inference that integrate with cloud platform services, supporting operational enterprise AI capabilities.
Infrastructure as Code and DevOps Practices
Develop and maintain Terraform Infrastructure as Code (IaC) scripts for cloud platform operational builds, ensuring reproducibility and consistency.
Implement continuous integration and deployment (CI/CD) practices using the agency’s Enterprise GitHub, and follow the agency’s Change Control Board release process.
Advanced Data Security and Governance Implementation
Implement advanced data security capabilities, including enterprise data labeling, fine-grained access control (sub-document/chunk level), and data tokenization and masking techniques.
Ensure all solutions comply with agency governance frameworks (e.g., Data Action Working Group, Enterprise Data Council) and support the agency's security office, Assess and Authorize process, and Zero Trust architecture principles.
Clearance
Must be clearable.
Citizenship US Citizenship
Required Qualifications
5+ years of progressive experience as a Data Engineer.
3+ years of specialized, hands‑on experience specifically in Azure Databricks and related Azure data ecosystem services, ideally in regulated or government environments where data security, governance, and compliance are paramount.
Core Technical Expertise
Expert‑level proficiency in Apache Spark and Delta Lake, with deep understanding of distributed data processing architectures, performance optimization techniques (e.g., Spark internals, partitioning, broadcast joins), and Delta Lake's ACID transaction capabilities.
Advanced capabilities in Python, SQL, and Scala for developing robust data engineering solutions.
Proven experience architecting and optimizing large‑scale data pipelines handling terabytes of data.
Comprehensive knowledge of both batch and real‑time ETL/ELT processes and expertise in multiple data modeling approaches (dimensional modeling, data vault, lakehouse modeling).
Azure Cloud Platform Mastery
Comprehensive, hands‑on experience with the integrated Cloud Data Management and Analytics Platform suite:
Azure Databricks (Primary data engineering platform)
Azure Data Factory (Orchestration of complex workflows)
Azure Data Lake Storage (Enterprise data repository)
Azure Synapse Analytics (Integrated analytics capabilities)
Proficiency with supporting services: Azure Machine Learning, Azure AI Services (including Azure OpenAI), Cosmos DB, App Services, Functions, and API Management.
Proficiency with Terraform for Infrastructure as Code (IaC).
Deep understanding of modern lakehouse architecture and experience implementing data fabric patterns supporting hybrid infrastructure.
Knowledge, Skills & Abilities
Exceptional analytical and problem‑solving abilities with a proven track record of resolving complex data engineering challenges.
Excellent written and verbal communication skills, with the ability to explain complex technical concepts to diverse audiences (business users, executives, technical teams).
Deep experience with Agile and Scrum methodologies and demonstrated flexibility to support multiple simultaneous initiatives.
Commitment to maintaining current knowledge in rapidly evolving Azure and Databricks technologies.
Strong understanding of security and privacy requirements in financial regulatory contexts.
Certificates
Microsoft Certified Azure Data Engineer Associate (Preferred).
Databricks Certified Data Engineer Professional or Associate (Preferred).
Education
BA/BS/MS in Computer Science, Engineering, Data Science, or equivalent professional experience.
Why Join Us? This role offers the opportunity to be instrumental in modernizing the agency’s data capabilities, directly enabling advanced analytics and AI solutions that support critical financial examination and regulatory functions. You will be shaping enterprise solution patterns that create lasting impact and will be involved in cutting‑edge initiatives, including enterprise AI model management and multi‑cloud data fabric implementations. Your work will directly contribute to the agency’s mission to maintain stability and confidence in the nation’s financial system.
Compensation & Benefits
Wage Range: Negotiable
General Benefits: Custom Software Systems, Inc. offers our employees a competitive benefits package that may include:
Health insurance plans
Health Savings Account (HSA)
Dental
Vision
Long‑term disability
Short‑term disability
Basic term life insurance
Supplemental term life insurance for employees, spouses, and dependents
Simple IRA
Parking/Commuting expense reimbursement
Training/Education
Parking/Commuting expense reimbursement
Compensation range must be coordinated with and approved by the CSS Chief Operating Officer (COO).
Compensation & Benefits information is required for all Maryland Employers effective October 1, 2024.
#J-18808-Ljbffr