Overview
The main goal of the DE team is to build robust golden data sets to power our business goals of creating more insights based products. Making data-driven decisions is key to Plaid's culture. To support that, we need to scale our data systems while maintaining correct and complete data. We provide tooling and guidance to teams across engineering, product, and business and help them explore our data quickly and safely to get the data insights they need, which ultimately helps Plaid serve our customers more effectively. Data Engineers heavily leverage SQL and Python to build data workflows. We use tools like DBT, Airflow, Redshift, ElasticSearch, Atlanta, and Retool to orchestrate data pipelines and define workflows. We work with engineers, product managers, business intelligence, data analysts, and many other teams to build Plaid's data strategy and a data-first mindset. Our engineering culture is IC-driven -- we favor bottom-up ideation and empowerment of our incredibly talented team. We are looking for engineers who are motivated by creating impact for our consumers and customers, growing together as a team, shipping the MVP, and leaving things better than we found them.
Responsibilities
- Understanding different aspects of the Plaid product and strategy to inform golden dataset choices, design and data usage principles.
- Have data quality and performance top of mind while designing datasets
- Advocating for adopting industry tools and practices at the right time
- Owning core SQL and Python data pipelines that power our data lake and data warehouse
- Well-documented data with defined dataset quality, uptime, and usefulness.
Qualifications
- 2+ years of dedicated data engineering experience, solving complex data pipeline issues at scale.
- You have experience building data models and data pipelines on top of large datasets (in the order of 500TB to petabytes)
- You value SQL as a flexible and extensible tool and are comfortable with modern SQL data orchestration tools like DBT, Mode, and Airflow.
- (Nice to have) You have experience working with different performant warehouses and data lakes; Redshift, Snowflake, Databricks
- (Nice to have) You have experience building and maintaining batch and real-time pipelines using technologies like Spark, Kafka.
Salary and location
$163,200 - $223,200 a year. The target base salary for this position ranges from $163,200/year to $223,200/year in Zone 1. The target base salary will vary based on the job's location.
- Geographic zones
- Zone 1 - New York City and San Francisco Bay Area
- Zone 2 - Los Angeles, Seattle, Washington D.C.
- Zone 3 - Austin, Boston, Denver, Houston, Portland, Sacramento, San Diego
- Zone 4 - Raleigh-Durham and all other US cities
Additional compensation in the form of equity and/or commission is dependent on the position offered. Plaid provides a comprehensive benefit plan, including medical, dental, vision, and 401(k). Pay is based on factors such as scope and responsibilities of the position, candidate's work experience and skillset, and location. Pay and benefits are subject to change at any time, consistent with the terms of any applicable compensation or benefit plans.
Employment type
- Full-time
Seniority level
- Mid-Senior level
Job function
- Information Technology
Industries
- Software Development, Technology, Information and Internet, and Financial Services