SelectMinds

Senior Data Engineer Big Data Cloud Integration L3

SelectMinds, Dallas, Texas, United States, 75215

Benefits:

HYBRID

Competitive salary

Opportunity for advancement

Senior Data Engineer – Big Data & Cloud Integration DALLAS, TX - HYRBID LONG TERM IN-PERSON INTERVIEW

Translate complex cross-functional business requirements and functional specifications into logical program designs and data solutions.

Partner with the product team to understand business needs and specifications.

Solve complex architecture, design and business problems.

Coordinate, Execute and participate in component integration (CIT) scenarios, system integration testing (SIT), and user acceptance testing (UAT) to identify application errors and to ensure quality software deployment.

Continuously work with cross-functional development teams (Data Analysts and Software Engineers) for creating PySpark jobs using Spark SQL and help them build reports on top of data pipelines.

Build, test and enhance data curation pipelines, integrate data from a wide variety of sources like DBMS, File systems and APIs for various OKRs and metrics development with high data quality and integrity.

Execute the development, maintenance, and enhancements of data ingestion solutions of varying complexity levels across various data sources like DBMS, File systems (structured and unstructured), APIs and Streaming on on-prem and cloud infrastructure.

Responsible for the design, implementation, and architecture of very large-scale data intelligence solutions around big data platforms.

Work with building data warehouse structures, and creating facts, dimensions, aggregate tables, by dimensional modeling, Star and Snowflake schemas.

Develop spark applications in PySpark on distributed environment to load huge number CSV files with different schema in to Hive ORC tables.

Perform ETL transformations on the data loaded into Spark Data Frames and do the in-memory computation.

Develop and implement data pipelines using AWS services such as Kinesis, S3 to process data in real-time.

Work with monitoring, logging and cost management tools that integrate with AWS.

Schedule the spark jobs using Airflow scheduler to monitor their performance

Flexible work from home options available.