Tesla Motors, Inc.

Backend Software Engineer, Machine Learning Platform, AI Infrastructure

Tesla Motors, Inc., Palo Alto, California, United States, 94306

What to Expect As a Software Engineer within the Autopilot AI Infrastructure team, you will work on reinforcing, optimizing, and scaling our infrastructure components supporting AI research activities for Autopilot and the Optimus.

At the core of our autonomy capabilities are neural networks that the research team is designing to train on very large amounts of data, across large-scale GPU clusters and our supercomputer Dojo. Robustly training these models at scale and in the shortest amount time is critical to our mission.

We are building out the Machine Learning Platform that our engineers and leadership use to schedule, manage and monitor machine learning experiments, data pipelines and artifacts. With the ever-increasing size of our datasets and compute clusters, we are looking for an experienced backend engineer to help drive scalability improvements and new capabilities in the platform.

What You'll Do

Develop and deploy solutions to scale our infrastructure effectively in response to rapidly growing demands

Drive implementation of best practices and monitoring systems to proactively detect and address issues in our production environment

Work across the stack on tools and infrastructure empowering the machine learning team to be effective. This ranges from developing/running model training and evaluation code to back-end infrastructure to occasional front-end work

Coordinate required resources with the team managing the cluster hardware to maintain high availability

Work closely with the research team to understand requirements and priorities.

What You'll Bring

Expertise in designing scalable and durable distributed systems

Strong knowledge of Python/Go and Linux

Experience working with diverse backend infrastructure components (SQL / NoSQL databases, caching, message brokers, event streams, monitoring etc)

Hands-on experience with containerization and orchestration technologies (Docker, Kubernetes) and setting up CI/CD flows

Knowledge of front-end development in React / strong product sense

Knowledge of machine learning, computer vision, or neural networks

Experience working with HPC clusters

Compensation and Benefits Expected Compensation

$118,000 - $390,000/annual salary + cash and stock awards + benefits

Pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. The total compensation package for this position may also include other elements dependent on the position offered. Details of participation in these benefit plans will be provided if an employee receives an offer of employment.

#J-18808-Ljbffr