Next Ventures
A Series A Business are revolutionizing the shopping experience using the power of generative AI and rich messaging technologies to build a personalized shopping assistant for every consumer.
The Role
We are seeking a Principal Data Engineer with deep expertise in Spark to lead the design and data infrastructure. This is a senior level hands on technical role, ideal for someone passionate about building scalable data systems, mentoring engineers and helping shape data strategy. As a thought leader on theirdata engineering team you will architect systems that support high performance batch and real time data processing, power advanced analytics and drive our AI team forward.
Key Responsibilities:
Own the architecture and strategic direction of scalable, distributed data infrastructure across cloud platforms.
Design and build a data compilation system to normalize, match, and merge products, reviews, and editorial data from thousands of data sources
Use the latest NLP, LLMs, and embedding models to generate the highest quality datasets with automated data auditing and reporting
Implement real time and batch data processing systems to power AI/ML use cases
Collaborate with engineering, AI and product teams to ensure data availability and reliability
Develop backend data solutions that support microservices architecture and a rapidly scaling product environment
Manage and extend integrations with third party e-commerce platforms to expand Wizards data ecosystem
Mentor and support data engineers, establishing best practices
You
8+ years of software development and data engineering experience with demonstrated ownership of production grade data infrastructure
Bachelor's degree in Computer Science or a related field, or equivalent practical experience.
Deep expertise in building ETL pipelines using Apache Spark, Databricks, or Hadoop is required
Strong understanding of distributed computing and modern data modeling techniques for scalable systems.
Expert in Python with experience implementing software engineering best practices
Solid understanding of distributed computing and data modeling for scalable systems.
Hands-on experience with both relational (MySQL / PostgreSQL) and NoSQL (MongoDB, DynamoDB, Cassandra) databases
Excellent communicator and collaborator, with a passion for mentoring, knowledge-sharing, and team growth
Nice to Have:
Experience working in early-stage, high-growth environments
Familiarity with MLOps pipelines and integrating ML models into data workflows.
Passionate about problem-solving with a proactive approach to finding innovative solutions.
#J-18808-Ljbffr
#J-18808-Ljbffr