Heroku, Inc.
Salesforce AI Research is looking for both Senior / Lead level Data / ML Engineers with Python, SQL, PyTorch experience, to help us take one of the world’s most extensive datasets and transform it into amazing products that feel like magic. You will work on cutting-edge AI applications and products. You will work closely with the Salesforce AI research team, brainstorming and driving continuous improvements in data processing, profiling, sampling, testing, and analysis. You will help validate models to bring meaningful AI models that power products used by hundreds of millions daily. Collaborate with senior deep learning researchers and drive AI innovation for Salesforce products.
Responsibilities
Support deep learning projects for research and product purposes, working with research scientists.
Utilize open source or vendor tools such as Mechanical Turk and Crowdflower, or develop customized data collection tools.
Analyze, curate, and shape data for model validation.
Identify data gaps, improve data quality, and integrate data from multiple sources.
Perform data profiling, statistical, and reliability testing.
Create demos for customers, conferences, and the Salesforce Research website.
Identify and push efforts to improve deep learning/machine learning models.
Partner with Product Managers and Research Scientists to understand requirements and develop prototypes.
Maintain operational excellence and continuous improvement with a proactive attitude.
Design technical solutions, lead architecture, and implement data acquisition and integration projects (batch and real-time).
Define solution architecture to ensure high data quality and timely insights.
Develop design artifacts such as data flow diagrams and data models.
Build data pipelines and processing tools using open source and proprietary technologies.
Serve as a domain expert and mentor for ETL and big data technologies.
Proactively identify and resolve performance and data quality issues, advocating for improvements.
Design tailored data structures and support research with big data pipelines.
Articulate the pros and cons of various technologies and pilot new tools to select the best solutions.
Technology Stack
Platforms: AWS, Google Cloud, Databricks, Heroku, Docker, Kubernetes
Languages: Python, JavaScript/HTML, SQL (others open to discussion)
Deep learning frameworks: PyTorch, TensorFlow, NumPy, Pandas
Required Skills
4+ years in data engineering
Experience building programmatic ETL pipelines with SQL technologies
Strong understanding of databases and handling sophisticated datasets
Knowledge of data governance, verification, and documentation tools
Proficiency with Python, shell scripts, and translating logic into SQL
Experience with scripting, web scraping, API data retrieval
Automating pipelines with tools like Airflow
Ability to adapt to changing business needs and adjust designs accordingly
Experience writing production-level SQL and designing data pipelines
Familiarity with Hadoop ecosystem and similar frameworks
Technical leadership in data lake, warehouse solutions, BI, and big data analytics
Knowledge of data modeling and high-volume ETL/ELT design
Experience with version control (GitHub, Subversion) and CI/CD tools
Experience with cloud platforms like GCP, AWS, or Snowflake
Effective in unstructured, fast-paced environments, with strong communication and self-management skills
Related technical degree required
Benefits & Perks Visit our
benefits site
for details on wellbeing reimbursement, parental leave, adoption assistance, fertility benefits, and more.
Salesforce Information Explore our
Salesforce Engineering Site .
In school or graduated within the last 12 months? Visit
FUTURE FORCE
for opportunities.
#J-18808-Ljbffr
Responsibilities
Support deep learning projects for research and product purposes, working with research scientists.
Utilize open source or vendor tools such as Mechanical Turk and Crowdflower, or develop customized data collection tools.
Analyze, curate, and shape data for model validation.
Identify data gaps, improve data quality, and integrate data from multiple sources.
Perform data profiling, statistical, and reliability testing.
Create demos for customers, conferences, and the Salesforce Research website.
Identify and push efforts to improve deep learning/machine learning models.
Partner with Product Managers and Research Scientists to understand requirements and develop prototypes.
Maintain operational excellence and continuous improvement with a proactive attitude.
Design technical solutions, lead architecture, and implement data acquisition and integration projects (batch and real-time).
Define solution architecture to ensure high data quality and timely insights.
Develop design artifacts such as data flow diagrams and data models.
Build data pipelines and processing tools using open source and proprietary technologies.
Serve as a domain expert and mentor for ETL and big data technologies.
Proactively identify and resolve performance and data quality issues, advocating for improvements.
Design tailored data structures and support research with big data pipelines.
Articulate the pros and cons of various technologies and pilot new tools to select the best solutions.
Technology Stack
Platforms: AWS, Google Cloud, Databricks, Heroku, Docker, Kubernetes
Languages: Python, JavaScript/HTML, SQL (others open to discussion)
Deep learning frameworks: PyTorch, TensorFlow, NumPy, Pandas
Required Skills
4+ years in data engineering
Experience building programmatic ETL pipelines with SQL technologies
Strong understanding of databases and handling sophisticated datasets
Knowledge of data governance, verification, and documentation tools
Proficiency with Python, shell scripts, and translating logic into SQL
Experience with scripting, web scraping, API data retrieval
Automating pipelines with tools like Airflow
Ability to adapt to changing business needs and adjust designs accordingly
Experience writing production-level SQL and designing data pipelines
Familiarity with Hadoop ecosystem and similar frameworks
Technical leadership in data lake, warehouse solutions, BI, and big data analytics
Knowledge of data modeling and high-volume ETL/ELT design
Experience with version control (GitHub, Subversion) and CI/CD tools
Experience with cloud platforms like GCP, AWS, or Snowflake
Effective in unstructured, fast-paced environments, with strong communication and self-management skills
Related technical degree required
Benefits & Perks Visit our
benefits site
for details on wellbeing reimbursement, parental leave, adoption assistance, fertility benefits, and more.
Salesforce Information Explore our
Salesforce Engineering Site .
In school or graduated within the last 12 months? Visit
FUTURE FORCE
for opportunities.
#J-18808-Ljbffr