Adobe

Sr Applied ML Engineer, Training Data Ecosystem for GenAI

Adobe, San Jose, California, United States, 95112

Lead Applied Machine Learning Engineer

Our Company Changing the world through digital experiences is what Adobe's all about. We give everyonefrom emerging artists to global brandseverything they need to design and deliver exceptional digital experiences! We're passionate about empowering people to create beautiful and powerful images, videos, and apps, and transform how companies interact with customers across every screen. We're on a mission to hire the very best and are committed to creating exceptional employee experiences where everyone is respected and has access to equal opportunity. We realize that new ideas can come from everywhere in the organization, and we know the next big idea could be yours! The Opportunity

The AI-Platform team is looking for a Lead Applied Machine Learning Engineer who has proven experience in training models in Generative AI and related applications including image/video generation, image/video editing, image/video understanding, large language models, and multimodal foundation models. This role combines both coordination skills across ML projects and hands-on delivery to evolve the training data ecosystem that powers our generative AI models for content synthesis and editing. As an ML Engineer for Applied Research at Adobe, you will be joining an outstanding team. You will have the opportunity to research, develop, and deploy large-scale machine learning solutions to advance the Creativity world through features in Adobe's products, reaching millions of people worldwide! Job Responsibilities

Conduct pioneering research and development in Generative AI, LLMs, LMMs, and reinforcement learning Develop and deploy novel generative AI technologies to Adobe Products Collaborate with world-class researchers and ML engineers to bring research ideas to production Work closely with data scientists, engineers, researchers, and product managers to build AI/ML solutions on the training data ecosystem to deliver a robust, scalable, and efficient data pipelines for the entire training data lifecycle. Implement the comprehensive strategy for acquiring, processing, curating, annotating, versioning, and ensuring the quality of large-scale training datasets for Adobe Firefly. Work across organizational boundaries to align priorities and drive projects forward. Evaluate and integrate new tools, technologies, and methodologies to continuously improve the training data infrastructure, workflows, and team productivity. What You'll Need to Succeed

Core Qualifications: Masters or Ph.D. in Computer Science, Data Science, Engineering, AI/ML, or a related technical field. Proficient in Python and PyTorch. Research or industry experience in training Generative AI models (pre-training and/or post-training) in at least one of the following modalities: image, video, 3D, or audio. Expertise in large-scale model training and optimization, including data curation, distributed training, and memory-efficient techniques. Experience with post-training techniques such as fine-tuning, alignment or distillation. Proven ability in building, deploying, measuring, and maintaining large-scale generative models (e.g., GANs, diffusion models, Transformers). Proven ability in building large-scale distributed ML pipelines focusing on Generative Ai. In-depth knowledge of machine learning algorithms and their applications to business problems with focus on Generative AI specially image and video. Deep understanding of the end-to-end data lifecycle for machine learning, particularly the unique challenges and requirements for training large-scale generative models (e.g., GANs, diffusion models). Proficiency with cloud-based data storage and processing technologies (e.g., AWS S3/Glue/EMR, Azure Blob Storage/Data Factory, GCP Cloud Storage/Dataflow, Spark, Databricks, Snowflake). Preferred Experience: Strong publication record in reinforcement learning, LLMs, and LMMs Experience on large-scale generative model training Experience on synthetic data generation Experience of working with large-scale datasets Experience of working with product teams on technology transfers Experience with MLOps principles and tools, particularly those focused on data management, data versioning (e.g., DVC), and experiment tracking for ML. A track record of contributions to open-source data tools, relevant academic publications, or patents in the data management space. Experience supporting large-scale generative model training through the delivery of robust and high-quality data solutions. Expertise in data quality assessment methodologies, data governance frameworks, data versioning strategies, and data annotation/labeling techniques and platforms. Experience working directly with, and managing, large-scale image and/or video datasets, including their acquisition, processing, and quality control.