Logo
iTCO Solutions

Senior AI/ML Engineer

iTCO Solutions, San Francisco, California, United States, 94199

Save Job

100% Remote

Job Title:

Senior AI/ML Engineer - Large Language Model Pretraining (100B+ Parameters)

Location -

West Coast 100% Remote

Role Overview We are seeking

Senior AI/ML Engineers

with

PhDs or Master's degrees

in Computer Science or related fields from

top 20 universities . You will lead the pretraining of

massive LLMs (100B+ parameters) , requiring deep expertise in distributed training, large-scale optimization, and model architecture. This is a rare opportunity to work with petabyte-scale datasets and cutting-edge compute clusters in a high-impact environment.

Key Responsibilities

Architect and implement large-scale training pipelines

for LLMs with 100B+ parameters. Optimize distributed training performance across thousands of GPUs/TPUs. Collaborate with research scientists to translate experimental results into production-grade training runs. Manage and preprocess petabyte-scale datasets for pretraining. Implement state-of-the-art techniques in scaling laws, model parallelism, and memory optimization. Conduct rigorous benchmarking, profiling, and performance tuning. Contribute to Client research in LLM architecture, training stability, and efficiency. Required Qualifications

Advanced degree

(PhD or Master's) in Computer Science, Machine Learning, or related field from a

top 20 global university

in CS. 3+ years

of hands-on experience with large-scale deep learning model training. Proven experience in

pretraining models exceeding 10B parameters , preferably 100B+. Deep expertise in distributed training frameworks ( DeepSpeed, Megatron-LM, PyTorch FSDP, TensorFlow Mesh, JAX/TPU ). Proficiency with

parallelism strategies

(data, tensor, pipeline) and

mixed precision training . Experience with large-scale cloud or HPC environments ( AWS, Azure, GCP, Slurm, Kubernetes, Ray ). Strong skills in

Python ,

CUDA , and performance optimization. Strong publication record in top-tier ML/AI venues (NeurIPS, ICML, ICLR, ACL, etc.) preferred. Preferred Skills

Experience with

LLM fine-tuning

(RLHF, LoRA, PEFT). Familiarity with

tokenizer development

and multilingual pretraining. Knowledge of

scaling laws

and

model evaluation frameworks

for massive LLMs. Hands-on work with

petabyte-scale distributed storage systems .

Verify:

United States Employment Opportunities Only

E-Verify is an internet-based system operated by the Department of Homeland Security and the Social Security Administration and allows employers to confirm an individual's employment eligibility to work in the United States. Under the E-Verify rules, effective September 8, 2009, federal agencies subject to the Federal Acquisition Regulation are required to modify, and include in new contracts, a provision that requires federal contractors and subcontractors to use E-Verify. ITCO Solutions is required to adhere to these requirements.

This message is intended for the use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message.