Logo
Salesforce, Inc..

Machine Learning Platform Engineer, LMTS

Salesforce, Inc.., New York, New York, us, 10261

Save Job

We are seeking a highly skilled and motivated AI Platform Engineer to play a pivotal role in the development of our ML/AI platform. This role will be instrumental in building, maintaining, and scaling the core infrastructure, platform services, and CI/CD pipelines that underpin our machine learning initiatives and product launches. You will work on critical projects that directly impact our marketing, sales, service, and product growth verticals of the organization.

This isn’t a traditional infrastructure role. You should be open to wearing multiple hats—including but not limited to infrastructure, software engineering, and UI/UX development. We’re looking for innovative, out-of-the-box thinkers who aren’t afraid to experiment, build complex systems, and tackle challenges across the stack.

What You’ll Do Key Responsibilities:

Infrastructure Development:

Design, implement, and manage secure and scalable cloud infrastructure (primarily AWS) including networking, permissions management, data management, and kubernetes.

ML Platform Services:

Develop and maintain core ML platform components such as Model Registry, permissions services for project access, and tools for SageMaker default setup and deployments.

CI/CD and Workflow Automation:

Build and optimize CI/CD pipelines using GitHub Actions for efficient and secure code deployment, Docker and package building, and security scanning.

Networking:

Ensure robust and secure connectivity for the platform, including ingress (public and VPN), egress, and domain management (Route53). Manage service mesh (Istio) for traffic routing and security trust between micro services.

Tooling and Automation:

Implement and manage essential tooling to enhance developer productivity and platform security, including secrets management, package/dependency management, testing frameworks, developer self-service tools, automation scripts/bots, and observability integrations.

Monitoring and Reliability:

Contribute to establishing monitoring solutions (e.g., Grafana, PagerDuty) and integrate security scanning to ensure platform health and security.

Security & Compliance:

Participate in security reviews and ensure all platform components adhere to security best practices and compliance requirements.

Collaboration:

Work closely with cross-functional teams, including ML engineers, data scientists, and product managers, to deliver robust and high-performance solutions.

Documentation:

Create and maintain comprehensive documentation for infrastructure, services, workflows, and user guides.

What We’re Looking For

Proven experience as a Platform Engineer, Software Engineer, or ML Infrastructure Engineer.

Strong software engineering skills, particularly with Python, for building scalable tools, automation scripts, and platform components.

Strong expertise in cloud platforms, particularly AWS (IAM, EKS, S3, SageMaker, etc.).

Extensive experience with CI/CD tools, especially GitHub Actions and ArgoCD

Proficiency in infrastructure-as-code principles and tools (e.g., Terraform).

Experience with containerization technologies (Docker) and orchestration (Kubernetes).

Understanding of networking concepts within cloud environments and service mesh technologies (eg., Istio)

Experience with MLOps concepts and tools.

Knowledge of Airflow or other workflow orchestration tools.

Experience with monitoring and alerting systems (Grafana, PagerDuty).

Familiarity with Okta or similar identity and access management systems.

Experience with tenant and project onboarding processes in a multi-tenant environment.

Familiarity with security best practices and conducting security reviews.

Ability to manage multiple priorities and dependencies effectively.

Excellent problem-solving and communication skills.

Preferred Qualifications (Bonus Points)

Exposure to A/B testing experimentation platforms.

Experience with Salesforce Ecosystem

Have Built Agents and Evaluated them

Experience with Agent Memory , MCP servers etc.

Experience with unstructured databases(vector or graph databases) and RAG pipelines

Experience working with modern data platforms and real-time processing frameworks, including cloud data warehouses (e.g., snowflake), streaming technologies (e.g. kafka, flink)

Experience with Feature Stores like Feast

#J-18808-Ljbffr