W. W. Grainger

Senior/Staff Software Engineer - Machine Learning Platform & Operations

W. W. Grainger, Chicago, Illinois, United States, 60290

Senior/Staff Software Engineer - Machine Learning Platform & Operations

Location:

CHICAGO, IL, US, 60661-4555; Remote, IL, US, N/A Work Location Type:

Hybrid About Grainger: W.W. Grainger, Inc., is a leading broad line distributor with operations primarily in North America, Japan and the United Kingdom. At Grainger, We Keep the World Working by serving more than 4.5 million customers worldwide with products and solutions delivered through innovative technology and deep customer relationships. Known for its commitment to service and award-winning culture, the Company had 2024 revenue of $17.2 billion across its two business models. In the High-Touch Solutions segment, Grainger offers approximately 2 million maintenance, repair and operating (MRO) products and services, including technical support and inventory management. In the Endless Assortment segment, Zoro.com offers customers access to more than 14 million products, and MonotaRO.com offers more than 24 million products. For more information, visit www.grainger.com. Compensation The anticipated base pay compensation range for this position is $121,500.00 to $202,500.00. Rewards and Benefits: With benefits starting on day one, our programs provide choice and flexibility to meet team members' individual needs, including: Medical, dental, vision, and life insurance plans with coverage starting on day one of employment and 6 free sessions each year with a licensed therapist to support your emotional wellbeing. 18 paid time off (PTO) days annually for full-time employees (accrual prorated based on employment start date) and 6 company holidays per year. 6% company contribution to a 401(k) Retirement Savings Plan each pay period, no employee contribution required. Employee discounts, tuition reimbursement, student loan refinancing and free access to financial counseling, education, and tools. Maternity support programs, nursing benefits, and up to 14 weeks paid leave for birth parents and up to 4 weeks paid leave for non-birth parents. For additional information regarding Grainger’s benefits, please click the link in the original posting. The pay range provided above is not a guarantee of compensation and reflects potential base pay at the time of posting based on the job grade. Individual base pay will depend on location and experience. Grainger reserves the right to amend, modify, or terminate its compensation and benefit programs in its sole discretion at any time, consistent with applicable law. Role overview The Machine Learning Platform & Operations team is focused on enabling machine learning scientists and engineers at Grainger to continuously develop, deploy, monitor, and refine machine learning models as well as improving the ML software development process. Our mission is to empower Grainger teams to effortlessly build, ship, and scale reliable machine learning, data science, and analytical solutions by proactively listening to our users and anticipating Grainger’s evolving needs; delivering self-service, quality-first platforms that accelerate business outcomes. You will work with machine learning, data engineering, network, security, and platform engineering teams to build core components of a scalable, self-service machine learning platform that powers customer-facing applications. You will play an important part in developing the tools and services that form the backbone of Grainger’s AI driven features leveraging methods in Deep Learning, Natural Language Processing / Generative AI, Computer Vision, and beyond. This is an exciting opportunity to join a team fueling the next phase in Grainger Technology Group’s data- and AI-driven modernization. Focus areas Our team is organized around three focus areas: Machine Learning Operations & Infrastructure: Build and maintain core infrastructure components (e.g., Kubernetes clusters) and tooling enabling self-service development and deployment of a variety of applications leveraging GitOps practices. Machine Learning Platform: Design and develop user-friendly software systems and interfaces supporting all stages of the machine learning development lifecycle. Machine Learning Effectiveness & Enablement: Guide, partner, and consult with machine learning, product, and business domain teams to foster responsible, scalable, and efficient development of high-quality ML systems. We seek individuals who can contribute to one or more focus areas. Candidates need not have all listed skills; we value curiosity and strong problem-solving. This is a software & platform engineering role, not a research/MLE role; you’ll code, design, and operate the platform that enables teams to build, ship, and scale. You will: Build self-service and automated components of the machine learning platform to enable the development, deployment, scaling, and monitoring of machine learning models. Ship production platform components end-to-end across multiple modules; own reliability, performance, security, and cost from design through operation. Design Helm releases and author GitOps objects (ArgoCD Applications/Projects) with RBAC/sync policies; keep deployments predictable and auditable. Collaborate with machine learning, network, security, infrastructure, and platform engineers to ensure performant access to data, compute, and networked services. Ensure a rigorous deployment process using DevOps standards and mentor users in software development best practices. Partner with teams across the business to drive broader adoption of ML, enabling teams to improve the pace and quality of ML system development. You have: Bachelor’s degree and 5+ years’ relevant work experience or an equivalent combination of education and experience. Track record building and operating production-grade, cloud-deployed systems (AWS preferred) with strong software engineering fundamentals (Python/Go or similar). Expertise with IaC tools and patterns to provision, manage, and deploy applications to multiple environments using DevOps or GitOps best practices (e.g., Terraform/Helm + GitHub Actions/ArgoCD). Familiarity with application monitoring and observability tools and integration patterns (e.g., Prometheus/Grafana, Splunk, DataDog, ELK). Familiarity with containerization and orchestration technologies (e.g., Docker, Kubernetes). Ability to work collaboratively in a team environment. Bonus: Expertise in designing, analyzing, and troubleshooting large-scale distributed systems and/or working with accelerated compute (e.g., GPUs). Working knowledge of the machine learning lifecycle and experience with ML systems and monitoring/observability. Experience with big data technologies, distributed computing frameworks, and/or streaming data processing tools (e.g., Spark, Kafka, Presto, Flink). Experience deploying, evaluating, and testing GenAI applications and their components (e.g., LLMs, Vector DBs). Note: If you don’t meet every qualification, we still encourage you to apply. EEO and accommodations: We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex (including pregnancy), national origin, sexual orientation, age, citizenship, marital status, disability, gender identity or expression, protected veteran status or any other protected characteristic. We are proud to be an equal opportunity workplace and we strive to provide reasonable accommodations during the application and hiring process and throughout employment as needed.

#J-18808-Ljbffr