Tesla Motors, Inc.
Backend Software Engineer, Machine Learning Platform, AI Infrastructure
Tesla Motors, Inc., Palo Alto, California, United States, 94306
What to Expect
As a Software Engineer within the Autopilot AI Infrastructure team, you will work on reinforcing, optimizing, and scaling our infrastructure components supporting AI research activities for Autopilot and the Optimus.
At the core of our autonomy capabilities are neural networks that the research team is designing to train on very large amounts of data, across large-scale GPU clusters and our supercomputer Dojo. Robustly training these models at scale and in the shortest amount time is critical to our mission.
We are building out the Machine Learning Platform that our engineers and leadership use to schedule, manage and monitor machine learning experiments, data pipelines and artifacts. With the ever-increasing size of our datasets and compute clusters, we are looking for an experienced backend engineer to help drive scalability improvements and new capabilities in the platform.
What You'll Do
Develop and deploy solutions to scale our infrastructure effectively in response to rapidly growing demands
Drive implementation of best practices and monitoring systems to proactively detect and address issues in our production environment
Work across the stack on tools and infrastructure empowering the machine learning team to be effective. This ranges from developing/running model training and evaluation code to back-end infrastructure to occasional front-end work
Coordinate required resources with the team managing the cluster hardware to maintain high availability
Work closely with the research team to understand requirements and priorities.
What You'll Bring
Expertise in designing scalable and durable distributed systems
Strong knowledge of Python/Go and Linux
Experience working with diverse backend infrastructure components (SQL / NoSQL databases, caching, message brokers, event streams, monitoring etc)
Hands-on experience with containerization and orchestration technologies (Docker, Kubernetes) and setting up CI/CD flows
Knowledge of front-end development in React / strong product sense
Knowledge of machine learning, computer vision, or neural networks
Experience working with HPC clusters
Compensation and Benefits Expected Compensation
$118,000 - $390,000/annual salary + cash and stock awards + benefits
Pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. The total compensation package for this position may also include other elements dependent on the position offered. Details of participation in these benefit plans will be provided if an employee receives an offer of employment.
#J-18808-Ljbffr
At the core of our autonomy capabilities are neural networks that the research team is designing to train on very large amounts of data, across large-scale GPU clusters and our supercomputer Dojo. Robustly training these models at scale and in the shortest amount time is critical to our mission.
We are building out the Machine Learning Platform that our engineers and leadership use to schedule, manage and monitor machine learning experiments, data pipelines and artifacts. With the ever-increasing size of our datasets and compute clusters, we are looking for an experienced backend engineer to help drive scalability improvements and new capabilities in the platform.
What You'll Do
Develop and deploy solutions to scale our infrastructure effectively in response to rapidly growing demands
Drive implementation of best practices and monitoring systems to proactively detect and address issues in our production environment
Work across the stack on tools and infrastructure empowering the machine learning team to be effective. This ranges from developing/running model training and evaluation code to back-end infrastructure to occasional front-end work
Coordinate required resources with the team managing the cluster hardware to maintain high availability
Work closely with the research team to understand requirements and priorities.
What You'll Bring
Expertise in designing scalable and durable distributed systems
Strong knowledge of Python/Go and Linux
Experience working with diverse backend infrastructure components (SQL / NoSQL databases, caching, message brokers, event streams, monitoring etc)
Hands-on experience with containerization and orchestration technologies (Docker, Kubernetes) and setting up CI/CD flows
Knowledge of front-end development in React / strong product sense
Knowledge of machine learning, computer vision, or neural networks
Experience working with HPC clusters
Compensation and Benefits Expected Compensation
$118,000 - $390,000/annual salary + cash and stock awards + benefits
Pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. The total compensation package for this position may also include other elements dependent on the position offered. Details of participation in these benefit plans will be provided if an employee receives an offer of employment.
#J-18808-Ljbffr