LinkedIn
LinkedIn is the world’s largest professional network, built to create economic opportunity for every member of the global workforce. Our products help people make powerful connections, discover exciting opportunities, build necessary skills, and gain valuable insights every day. We are also committed to providing transformational opportunities for our own employees by investing in their growth. We aspire to create a culture that is built on trust, care, inclusion, and fun where everyone can succeed.
Job Description This role can be based in Mountain View, CA, San Francisco, CA, or Bellevue, WA. Join us to push the boundaries of scaling large models together.
The team is responsible for scaling LinkedIn's AI model training, feature engineering and serving with hundreds of billions of parameter models and large-scale feature engineering infra for all AI use cases from recommendation models, large language models, to computer vision models. We optimize performance across algorithms, AI frameworks, data infra, compute software, and hardware to harness the power of our GPU fleet with thousands of latest GPU cards. The team also works closely with the open source community and has many open source committers (TensorFlow, Horovod, Ray, vLLM, HuggingFace, DeepSpeed, etc.). Additionally, the team focuses on technologies like LLMs, GNNs, Incremental Learning, Online Learning and Serving performance optimizations across billions of user queries.
Model Training Infrastructure : As an engineer on the AI Training Infra team, you will play a crucial role in building the next‑generation training infrastructure to power AI use cases. You will design and implement high‑performance data I/O, work with open source teams to identify and resolve issues in popular libraries like HuggingFace, Horovod and PyTorch, enable distributed training over hundreds of billions of parameter models, debug and optimize deep‑learning training, and provide advanced support for internal AI teams in areas such as model parallelism, tensor parallelism, Zero‑plus, etc. You will also assist in and guide the development of containerized pipeline orchestration infrastructure, including creating and distributing stable base container images, providing advanced profiling and observability, and updating internally maintained versions of deep‑learning frameworks and companion libraries.
Feature Engineering : The team shapes the future of AI with a state‑of‑the‑art Feature Platform that empowers AI users to effortlessly create, compute, store, consume, monitor, and govern features within online, offline, and near‑line environments, optimizing the process for model training and serving. As an engineer you will explore and innovate within the online, offline, and near‑line spaces at scale, developing the infrastructure needed to transform raw data into valuable feature insights. You will use open‑source technologies such as Spark, Beam, and Flink to process and structure feature data, ensuring optimal storage in the Feature Store and high‑performance serving.
Model Serving Infrastructure : You will build low‑latency, high‑performance applications that serve very large and complex models across LLMs and personalization models. You will build compute‑efficient infra on top of native cloud, enable GPU‑based inference for a variety of use cases, apply CUDA‑level optimizations for high performance, and enable on‑device and online training. Challenges include scale, agility, and enabling GPU inference at scale.
ML Ops : The MLOps and Experimentation team is responsible for the infrastructure that runs MLOps and experimentation systems across LinkedIn. The team builds AI metadata, observability, orchestration, ramping, and experimentation tools for all models, enabling product and infrastructure engineers to optimize their models and deliver the best performance possible.
As a Senior Software Engineer, you will have opportunities to advance one of the most scalable AI platforms in the world while working with researchers and engineers to build your career and personal brand in the AI industry.
Responsibilities
Design, implement, and optimize large‑scale distributed serving or training for personalized recommendation and large language models.
Improve observability and understandability of systems to boost developer productivity and system sustenance.
Mentor engineers, define a challenging technical culture, and help build a fast‑growing team.
Collaborate with the open‑source community to influence cutting‑edge projects (e.g., vLLMs, PyTorch, GNNs, DeepSpeed, HuggingFace).
Qualifications Basic Qualifications
Bachelor’s Degree in Computer Science or related discipline, or equivalent practical experience.
2+ years of experience in the industry building deep‑learning systems.
2+ years of experience with Java, C++, Python, Go, Rust, C#, or functional languages such as Scala.
Hands‑on experience developing distributed or large‑scale systems.
Preferred Qualifications
BS and 5+ years of relevant work experience; MS and 4+ years; or PhD and 2+ years.
Experience working with geographically distributed co‑workers.
Outstanding interpersonal communication skills and ability to work in a diverse, team‑focused environment.
Experience building ML applications, LLM serving, GPU serving.
Experience with distributed data‑processing engines (Flink, Beam, Spark) and feature engineering.
Experience with large‑scale distributed search or similar systems.
Expertise in machine‑learning infrastructure (MLflow, Kubeflow) and large‑scale distributed systems.
Co‑author or maintainer of open‑source projects.
Familiarity with containers and container‑orchestration systems.
Expertise in deep‑learning frameworks and tensor libraries (PyTorch, TensorFlow, JAX/Flax).
Suggested Skills
ML algorithm development.
Experience in machine‑learning and deep learning.
Experience in information retrieval, recommendation systems, distributed serving, or big data (plus).
Benefits
We strongly believe in the well‑being of employees and families. We offer generous health and wellness programs and time away for employees of all levels. LinkedIn is committed to fair and equitable compensation practices. The pay range is $139,000–$229,000, based on skill set, experience, and location. Compensation may also include annual performance bonus, stock, benefits, or other incentive plans.
Equal Opportunity Statement We seek candidates with a wide range of perspectives and backgrounds and are proud to be an equal‑opportunity employer. LinkedIn considers qualified applicants without regard to race, color, religion, creed, gender, national origin, age, disability, veteran status, marital status, pregnancy, sex, gender expression or identity, sexual orientation, citizenship, or any other legally protected class.
LinkedIn is committed to offering an inclusive and accessible experience for all job seekers, including individuals with disabilities. For reasonable accommodations, please contact accommodations@linkedin.com.
We comply with the San Francisco Fair Chance Ordinance and the Pay Transparency policy, and provide a Global Data Privacy Notice for Job Candidates.
#J-18808-Ljbffr
Job Description This role can be based in Mountain View, CA, San Francisco, CA, or Bellevue, WA. Join us to push the boundaries of scaling large models together.
The team is responsible for scaling LinkedIn's AI model training, feature engineering and serving with hundreds of billions of parameter models and large-scale feature engineering infra for all AI use cases from recommendation models, large language models, to computer vision models. We optimize performance across algorithms, AI frameworks, data infra, compute software, and hardware to harness the power of our GPU fleet with thousands of latest GPU cards. The team also works closely with the open source community and has many open source committers (TensorFlow, Horovod, Ray, vLLM, HuggingFace, DeepSpeed, etc.). Additionally, the team focuses on technologies like LLMs, GNNs, Incremental Learning, Online Learning and Serving performance optimizations across billions of user queries.
Model Training Infrastructure : As an engineer on the AI Training Infra team, you will play a crucial role in building the next‑generation training infrastructure to power AI use cases. You will design and implement high‑performance data I/O, work with open source teams to identify and resolve issues in popular libraries like HuggingFace, Horovod and PyTorch, enable distributed training over hundreds of billions of parameter models, debug and optimize deep‑learning training, and provide advanced support for internal AI teams in areas such as model parallelism, tensor parallelism, Zero‑plus, etc. You will also assist in and guide the development of containerized pipeline orchestration infrastructure, including creating and distributing stable base container images, providing advanced profiling and observability, and updating internally maintained versions of deep‑learning frameworks and companion libraries.
Feature Engineering : The team shapes the future of AI with a state‑of‑the‑art Feature Platform that empowers AI users to effortlessly create, compute, store, consume, monitor, and govern features within online, offline, and near‑line environments, optimizing the process for model training and serving. As an engineer you will explore and innovate within the online, offline, and near‑line spaces at scale, developing the infrastructure needed to transform raw data into valuable feature insights. You will use open‑source technologies such as Spark, Beam, and Flink to process and structure feature data, ensuring optimal storage in the Feature Store and high‑performance serving.
Model Serving Infrastructure : You will build low‑latency, high‑performance applications that serve very large and complex models across LLMs and personalization models. You will build compute‑efficient infra on top of native cloud, enable GPU‑based inference for a variety of use cases, apply CUDA‑level optimizations for high performance, and enable on‑device and online training. Challenges include scale, agility, and enabling GPU inference at scale.
ML Ops : The MLOps and Experimentation team is responsible for the infrastructure that runs MLOps and experimentation systems across LinkedIn. The team builds AI metadata, observability, orchestration, ramping, and experimentation tools for all models, enabling product and infrastructure engineers to optimize their models and deliver the best performance possible.
As a Senior Software Engineer, you will have opportunities to advance one of the most scalable AI platforms in the world while working with researchers and engineers to build your career and personal brand in the AI industry.
Responsibilities
Design, implement, and optimize large‑scale distributed serving or training for personalized recommendation and large language models.
Improve observability and understandability of systems to boost developer productivity and system sustenance.
Mentor engineers, define a challenging technical culture, and help build a fast‑growing team.
Collaborate with the open‑source community to influence cutting‑edge projects (e.g., vLLMs, PyTorch, GNNs, DeepSpeed, HuggingFace).
Qualifications Basic Qualifications
Bachelor’s Degree in Computer Science or related discipline, or equivalent practical experience.
2+ years of experience in the industry building deep‑learning systems.
2+ years of experience with Java, C++, Python, Go, Rust, C#, or functional languages such as Scala.
Hands‑on experience developing distributed or large‑scale systems.
Preferred Qualifications
BS and 5+ years of relevant work experience; MS and 4+ years; or PhD and 2+ years.
Experience working with geographically distributed co‑workers.
Outstanding interpersonal communication skills and ability to work in a diverse, team‑focused environment.
Experience building ML applications, LLM serving, GPU serving.
Experience with distributed data‑processing engines (Flink, Beam, Spark) and feature engineering.
Experience with large‑scale distributed search or similar systems.
Expertise in machine‑learning infrastructure (MLflow, Kubeflow) and large‑scale distributed systems.
Co‑author or maintainer of open‑source projects.
Familiarity with containers and container‑orchestration systems.
Expertise in deep‑learning frameworks and tensor libraries (PyTorch, TensorFlow, JAX/Flax).
Suggested Skills
ML algorithm development.
Experience in machine‑learning and deep learning.
Experience in information retrieval, recommendation systems, distributed serving, or big data (plus).
Benefits
We strongly believe in the well‑being of employees and families. We offer generous health and wellness programs and time away for employees of all levels. LinkedIn is committed to fair and equitable compensation practices. The pay range is $139,000–$229,000, based on skill set, experience, and location. Compensation may also include annual performance bonus, stock, benefits, or other incentive plans.
Equal Opportunity Statement We seek candidates with a wide range of perspectives and backgrounds and are proud to be an equal‑opportunity employer. LinkedIn considers qualified applicants without regard to race, color, religion, creed, gender, national origin, age, disability, veteran status, marital status, pregnancy, sex, gender expression or identity, sexual orientation, citizenship, or any other legally protected class.
LinkedIn is committed to offering an inclusive and accessible experience for all job seekers, including individuals with disabilities. For reasonable accommodations, please contact accommodations@linkedin.com.
We comply with the San Francisco Fair Chance Ordinance and the Pay Transparency policy, and provide a Global Data Privacy Notice for Job Candidates.
#J-18808-Ljbffr