Stypi (Acquired by Salesforce)

Data Engineering LMTS

Stypi (Acquired by Salesforce), Palo Alto, California, United States, 94301

Salesforce AI Research Data Engineer

Salesforce is the #1 AI CRM, where humans with agents drive customer success together. Here, ambition meets action. Tech meets trust. And innovation isn't a buzzword

it's a way of life. The world of work as we know it is changing and we're looking for Trailblazers who are passionate about bettering business and the world through AI, driving innovation, and keeping Salesforce's core values at the heart of it all. Ready to level-up your career at the company leading workforce transformation in the agentic era? Agentforce is the future of AI, and you are the future of Salesforce. Salesforce AI Research is looking for both Senior / Lead level Data / ML Engineers with Python, SQL, PyTorch experience, to help us take one of the world's most extensive datasets and transform it into amazing products that feel like magic. You will work on cutting-edge AI applications and products. You will work closely with the Salesforce AI research team brainstorming by driving continuous improvements in moving, aggregating, profiling, sampling, testing and analyzing data. You will help validate models so that we can bring meaningful AI models to power products used by hundreds of millions of people every day. Work with senior deep learning researchers and drive AI innovation for Salesforce products. Responsibilities: Work with research scientists to support cutting-edge deep learning projects for both research and product purposes Use open source or vendor tools such as Mechanical Turk and Crowdflower or build customized tools/systems to collect data Analyze, curate and build data shapes for the model under validation Identify incomplete data, improve quality of data, and integrate data from several data sources. Perform data profiling, statistical testing, and reliability testing on data. Build demos for our customers, conference presentations, and the Salesforce Research website Identify and push efforts to improve deep learning/machine learning models Partner end-to-end with Product Managers and Research Scientists to understand requirements and bring ideas to prototype Harness operational excellence & continuous improvement with a can do leadership attitude. Be responsible for the technical solution design, lead the technical architecture and implementation of data acquisition and integration projects, both batch and real time Define the overall solution architecture needed to implement a layered data stack that ensures a high level of data quality and timely insights Craft technical solutions and assemble design artifacts (functional design documents, data flow diagrams, data models, etc.) Build data pipelines data processing tools and technologies in open source and proprietary products Serve the team as a domain expert & mentor for ETL design, and other related big data and programming technologies Proactively identify performance & data quality problems and drive the team to remediate them. Advocate architectural and code improvements to the team to improve execution speed and reliability Design and develop tailored data structures Support research by designing, developing, and maintaining all parts of the Big Data pipeline for reporting, statistical and machine learning, and computational requirements Clearly articulate pros and cons of various technologies and platforms in open source and proprietary products Implement proof of concept on new technology and tools to help the organization pick the best tools and solutions Technology Stack: Platform: AWS, Google Cloud, Databricks, Heroku, Docker, Kubernetes Programming languages: Python, JavaScript/HTML, SQL Deep learning frameworks: PyTorch, TensorFlow, NumPy, Pandas Required Skills: 4+ years experience in data engineering Build programmatic ETL pipelines with SQL based technologies and platforms Solid understanding of databases, and working with sophisticated datasets Data governance, verification and data documentation using current tools and future adopted tools and platform Work with different technologies (Python, shell scripts) and translate logic into well-performing SQL Perform tasks such as writing scripts, web scraping, getting data from APIs etc. Automate data pipelines using scheduling tools like Airflow Be prepared for changes in business direction and understand when to adjust designs Experience writing production level SQL code and good understanding of Data Engineering pipelines Experience with Hadoop ecosystem and similar frameworks Previous projects should display technical leadership with an emphasis on data lake, data warehouse solutions, business intelligence, big data analytics, enterprise-scale custom data products Knowledge of data modeling techniques and high-volume ETL/ELT design Experience with version control systems (Github, Subversion) and deployment tools (e.g. continuous integration) required Experience working with Public Cloud platforms like GPC, AWS, or Snowflake Ability to work effectively in an unstructured and fast-paced environment both independently and in a team setting, with a high degree of self-management with clear communication and commitment to delivery timelines A related technical degree required