TP-LINK
Sr. Big Data Engineer - Data Infrastructure
TP-LINK, Irvine, California, United States, 92713
Overview
Sr. Big Data Engineer - Data Infrastructure at TP-Link. This role focuses on designing and building scalable data pipelines and managing data infrastructure to support large-scale data processing in production. Responsibilities
Design and build scalable data pipelines: Develop and maintain high performance and large-scale data ingestion and transformation, including ETL/ELT processes, data de-identification, and security management. Data orchestration and automation: Develop and manage automated data workflows using tools like Apache Airflow to schedule pipelines, manage dependencies, and ensure reliable, timely data processing and availability. AWS integration and cloud expertise: Build data pipelines integrated with AWS cloud-native storage and compute services, leveraging scalable cloud infrastructure for data processing. Monitoring and data quality: Implement comprehensive monitoring, logging, and alerting to ensure high availability, fault tolerance and data quality through self-healing strategies and robust data validation processes. Technology innovation: Stay current with emerging big data technologies and industry trends, recommending and implementing new tools and approaches to continuously improve data infrastructure. Technical leadership: Provide technical leadership for data infrastructure teams, guide architecture decisions and system design best practices. Mentor junior engineers through code reviews and knowledge sharing, lead complex projects from concept to production, and help to foster a culture of operational excellence. Qualifications
Experience requirements: 5+ years in data engineering, software engineering, or data infrastructure with proven experience building and operating large scale data pipelines and distributed systems in production, including terabyte scale big data environments. Programming proficiency: Strong Python skills for building data pipelines and processing jobs, with ability to write clean, maintainable, and efficient code. Experience with Git version control and collaborative development workflows required. Distributed systems expertise: Deep knowledge of distributed systems and parallel processing concepts. Proficient in debugging and performance tuning large scale data systems, with understanding of data partitioning, sharding, consistency, and fault tolerance in distributed data processing. Big data frameworks: Strong proficiency in big data processing frameworks such as Apache Spark for batch processing and other relevant batch processing technologies. Database and data warehouse expertise: Strong understanding of relational database concepts and data warehouse principles. Workflow Orchestration: Hands-on experience with data workflow orchestration tools like Apache Airflow or AWS Step Functions for scheduling, coordinating, and monitoring complex data pipelines. Problem solving and collaboration: Excellent problem solving skills with strong attention to detail and ability to work effectively in collaborative team environments. Preferred Qualifications
Advanced degree: Master's degree in Computer Science or related field providing strong theoretical foundation in large-scale distributed systems and data processing algorithms. Modern data technology: Exposure to agentic AI patterns, knowledge base systems, and expert systems is a plus. Experience with real-time streaming processing frameworks like Apache Kafka, Apache Flink, Apache Beam, or pub/sub real-time messaging systems is a plus. Advanced database and data warehouse expertise: Familiar with diverse database technologies in addition to relational, such as NoSQL, NewSQL, key value, columnar, graph, document, time series databases. Ability to design and optimize schemas/data models for analytics use cases, with experience in modern data storage solutions like data warehouses (Redshift, BigQuery, Databricks, Snowflake). Additional programming languages: Proficiency in additional languages such as Java or Scala is a plus. Cloud and infrastructure expertise: Experience with AWS cloud platforms and hands-on skills in infrastructure as code (SDK, CDK, Terraform) and container orchestration (Docker/Kubernetes) for automated environment setup and scaling. Benefits
Salary Range: $150,000 - $200,000 Free snacks and drinks, and provided lunch on Fridays Fully paid medical, dental, and vision insurance (partial coverage for dependents) Contributions to 401k funds Bi-annual reviews, and annual pay increases Health and wellness benefits, including free gym membership Quarterly team-building events At TP-Link Systems Inc., we are committed to equal employment opportunities and prohibiting discrimination and harassment of any kind. We welcome applicants from diverse backgrounds and value diverse perspectives. Please, no third-party agency inquiries, and we are unable to offer visa sponsorships at this time. #J-18808-Ljbffr
Sr. Big Data Engineer - Data Infrastructure at TP-Link. This role focuses on designing and building scalable data pipelines and managing data infrastructure to support large-scale data processing in production. Responsibilities
Design and build scalable data pipelines: Develop and maintain high performance and large-scale data ingestion and transformation, including ETL/ELT processes, data de-identification, and security management. Data orchestration and automation: Develop and manage automated data workflows using tools like Apache Airflow to schedule pipelines, manage dependencies, and ensure reliable, timely data processing and availability. AWS integration and cloud expertise: Build data pipelines integrated with AWS cloud-native storage and compute services, leveraging scalable cloud infrastructure for data processing. Monitoring and data quality: Implement comprehensive monitoring, logging, and alerting to ensure high availability, fault tolerance and data quality through self-healing strategies and robust data validation processes. Technology innovation: Stay current with emerging big data technologies and industry trends, recommending and implementing new tools and approaches to continuously improve data infrastructure. Technical leadership: Provide technical leadership for data infrastructure teams, guide architecture decisions and system design best practices. Mentor junior engineers through code reviews and knowledge sharing, lead complex projects from concept to production, and help to foster a culture of operational excellence. Qualifications
Experience requirements: 5+ years in data engineering, software engineering, or data infrastructure with proven experience building and operating large scale data pipelines and distributed systems in production, including terabyte scale big data environments. Programming proficiency: Strong Python skills for building data pipelines and processing jobs, with ability to write clean, maintainable, and efficient code. Experience with Git version control and collaborative development workflows required. Distributed systems expertise: Deep knowledge of distributed systems and parallel processing concepts. Proficient in debugging and performance tuning large scale data systems, with understanding of data partitioning, sharding, consistency, and fault tolerance in distributed data processing. Big data frameworks: Strong proficiency in big data processing frameworks such as Apache Spark for batch processing and other relevant batch processing technologies. Database and data warehouse expertise: Strong understanding of relational database concepts and data warehouse principles. Workflow Orchestration: Hands-on experience with data workflow orchestration tools like Apache Airflow or AWS Step Functions for scheduling, coordinating, and monitoring complex data pipelines. Problem solving and collaboration: Excellent problem solving skills with strong attention to detail and ability to work effectively in collaborative team environments. Preferred Qualifications
Advanced degree: Master's degree in Computer Science or related field providing strong theoretical foundation in large-scale distributed systems and data processing algorithms. Modern data technology: Exposure to agentic AI patterns, knowledge base systems, and expert systems is a plus. Experience with real-time streaming processing frameworks like Apache Kafka, Apache Flink, Apache Beam, or pub/sub real-time messaging systems is a plus. Advanced database and data warehouse expertise: Familiar with diverse database technologies in addition to relational, such as NoSQL, NewSQL, key value, columnar, graph, document, time series databases. Ability to design and optimize schemas/data models for analytics use cases, with experience in modern data storage solutions like data warehouses (Redshift, BigQuery, Databricks, Snowflake). Additional programming languages: Proficiency in additional languages such as Java or Scala is a plus. Cloud and infrastructure expertise: Experience with AWS cloud platforms and hands-on skills in infrastructure as code (SDK, CDK, Terraform) and container orchestration (Docker/Kubernetes) for automated environment setup and scaling. Benefits
Salary Range: $150,000 - $200,000 Free snacks and drinks, and provided lunch on Fridays Fully paid medical, dental, and vision insurance (partial coverage for dependents) Contributions to 401k funds Bi-annual reviews, and annual pay increases Health and wellness benefits, including free gym membership Quarterly team-building events At TP-Link Systems Inc., we are committed to equal employment opportunities and prohibiting discrimination and harassment of any kind. We welcome applicants from diverse backgrounds and value diverse perspectives. Please, no third-party agency inquiries, and we are unable to offer visa sponsorships at this time. #J-18808-Ljbffr