Link Consulting Services

Software Data Engineer

Link Consulting Services, Plano, Texas, us, 75086

Job Description

Software Data Engineer Expected start date: Dec 1st

Duration of engagement: 3 months, with potential extension

Work location: Alpharetta, GA or Plano TX.

Work location model: Hybrid, 3 days in office

About the Role The ideal candidate will be responsible for designing and maintaining modern, scalable data solutions on Azure using Databricks. This includes building data pipelines, ETL / ELT workflows, and architectures such as Data Lakes, Warehouses, and Lakehouses for both real-time and batch processing. The role involves integrating large datasets from diverse sources, implementing Delta Lake, and preparing data for machine learning through feature stores.

Key Responsibilities

Design, develop, and optimize scalable data pipelines and ETL / ELT workflows using Databricks on Azure

Build and maintain modern data architectures (Data Lake, Data Warehouse, Lakehouse) for real-time streaming and batch processing on Azure

Implement data integration solutions for large-scale datasets across diverse data sources using Delta Lake and other data formats

Create feature stores and data preparation workflows for machine learning applications on Azure

Develop and maintain data quality frameworks and implement data validation checks

Collaborate with data scientists, ML engineers, analysts, and business stakeholders to deliver high-quality, production‑ready data solutions

Monitor, troubleshoot, and optimize data workflows for performance, cost‑efficiency, and reliability

Implement data governance, security, and compliance standards across all data processes

Create and maintain comprehensive technical documentation for data pipelines and architectures

Required Qualifications

Data Architecture: Deep understanding of Data Lake, Data Warehouse, and Lakehouse concepts with hands‑on implementation experience

Databricks & Spark: 3+ years of hands‑on experience with Databricks on Azure, Apache Spark (PySpark / Spark SQL), Delta Lake optimization

Azure Platform: 3+ years working with Azure Data Factory (ADF), Azure Data Lake Storage (ADLS), Azure Synapse Analytics, Azure ML Studio, Azure Databricks

Programming: Strong proficiency in Python (including pandas, NumPy), SQL, and Unix / Linux shell scripting; experience with Java or Scala is a plus

Streaming: 3+ years’ experience with Apache Kafka or Azure Event Hubs, Azure Stream Analytics

DevOps: Hands‑on experience with Git, CI / CD pipelines (Azure DevOps, GitHub Actions), and build tools (Maven, Gradle)

Orchestration: Working knowledge of workflow schedulers (Apache Airflow, Azure Data Factory, Databricks Workflows, TWS)

Problem‑solving: Strong analytical and debugging skills with ability to work in agile / scrum environments

Preferred Qualifications

Experience with ML frameworks and libraries (scikit‑learn, TensorFlow, PyTorch) for data preparation and feature engineering on Azure

Experience with vector databases (Azure AI Search, Pinecone, Weaviate, Milvus) and RAG (Retrieval Augmented Generation) architectures

Experience with modern data transformation tools (DBT, Spark Structured Streaming on Databricks)

Understanding of LLM applications, prompt engineering, and AI agent frameworks (Azure OpenAI Service, Semantic Kernel)

Familiarity with containerization (Docker, Azure Kubernetes Service)

Experience with monitoring and observability tools (Azure Monitor, Application Insights, Datadog, Grafana)

Certifications in Databricks, Azure Data Engineer Associate, Azure AI Engineer, or Azure Solutions Architect

Educational Background

Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.

#J-18808-Ljbffr