Heroku, Inc.

Data Engineering LMTS

Heroku, Inc., Palo Alto, California, United States, 94306

Salesforce AI Research is looking for both Senior / Lead level Data / ML Engineers with Python, SQL, PyTorch experience, to help us take one of the world’s most extensive datasets and transform it into amazing products that feel like magic. You will work on cutting-edge AI applications and products. You will work closely with the Salesforce AI research team, brainstorming and driving continuous improvements in data processing, profiling, sampling, testing, and analysis. You will help validate models to bring meaningful AI models that power products used by hundreds of millions daily. Collaborate with senior deep learning researchers and drive AI innovation for Salesforce products.

Responsibilities

Support deep learning projects for research and product purposes, working with research scientists.

Utilize open source or vendor tools such as Mechanical Turk and Crowdflower, or develop customized data collection tools.

Analyze, curate, and shape data for model validation.

Identify data gaps, improve data quality, and integrate data from multiple sources.

Perform data profiling, statistical, and reliability testing.

Create demos for customers, conferences, and the Salesforce Research website.

Identify and push efforts to improve deep learning/machine learning models.

Partner with Product Managers and Research Scientists to understand requirements and develop prototypes.

Maintain operational excellence and continuous improvement with a proactive attitude.

Design technical solutions, lead architecture, and implement data acquisition and integration projects (batch and real-time).

Define solution architecture to ensure high data quality and timely insights.

Develop design artifacts such as data flow diagrams and data models.

Build data pipelines and processing tools using open source and proprietary technologies.

Serve as a domain expert and mentor for ETL and big data technologies.

Proactively identify and resolve performance and data quality issues, advocating for improvements.

Design tailored data structures and support research with big data pipelines.

Articulate the pros and cons of various technologies and pilot new tools to select the best solutions.

Technology Stack

Platforms: AWS, Google Cloud, Databricks, Heroku, Docker, Kubernetes

Languages: Python, JavaScript/HTML, SQL (others open to discussion)

Deep learning frameworks: PyTorch, TensorFlow, NumPy, Pandas

Required Skills

4+ years in data engineering

Experience building programmatic ETL pipelines with SQL technologies

Strong understanding of databases and handling sophisticated datasets

Knowledge of data governance, verification, and documentation tools

Proficiency with Python, shell scripts, and translating logic into SQL

Experience with scripting, web scraping, API data retrieval

Automating pipelines with tools like Airflow

Ability to adapt to changing business needs and adjust designs accordingly

Experience writing production-level SQL and designing data pipelines

Familiarity with Hadoop ecosystem and similar frameworks

Technical leadership in data lake, warehouse solutions, BI, and big data analytics

Knowledge of data modeling and high-volume ETL/ELT design

Experience with version control (GitHub, Subversion) and CI/CD tools

Experience with cloud platforms like GCP, AWS, or Snowflake

Effective in unstructured, fast-paced environments, with strong communication and self-management skills

Related technical degree required

Benefits & Perks Visit our

benefits site

for details on wellbeing reimbursement, parental leave, adoption assistance, fertility benefits, and more.

Salesforce Information Explore our

Salesforce Engineering Site .

In school or graduated within the last 12 months? Visit

FUTURE FORCE

for opportunities.

#J-18808-Ljbffr