Cynet Systems
Principal Data Scientist - Remote / Telecommute
Cynet Systems, New York, New York, United States
Job Description
Lead end-to-end training and fine-tuning of Large Language Models LLMs, including both open-source e.g., Qwen, LLaMA, Mistral and closed-source (e.g., OpenAI, Gemini, Anthropic) ecosystems.
Architect and implement GraphRAG pipelines, including knowledge graph representation and retrieval for enhanced contextual grounding.
Build and scale distributed training environments using NCCL and InfiniBand for multi-GPU and multi-node training.
Apply reinforcement learning techniques (e.g., RLHF, RLAIF) to align model behavior with human preferences and domain-specific goals.
Qualifications
PhD or Masters degree in Computer Science, Machine Learning, or related field. 8+ years of experience in applied AIML, with a strong track record of delivering production-grade models. LLM training and fine-tuning (e.g., GPT, LLaMA, Mistral, Qwen). Graph-based retrieval systems (GraphRAG, knowledge graphs). Embedding models (e.g., BGE, E5, SimCSE). Semantic search and vector databases (e.g., FAISS, Weaviate, Milvus). Document segmentation and preprocessing (OCR, layout parsing). Distributed training frameworks (NCCL, Horovod, DeepSpeed). High-performance networking (InfiniBand, RDMA). Model fusion and ensemble techniques (stacking, boosting, gating). Optimization algorithms (Bayesian, Particle Swarm, Genetic Algorithms). Symbolic AI and rule-based systems. Meta-learning and Mixture of Experts architectures. Reinforcement learning (e.g., RLHF, PPO, DPO).
Must Have
PhD or Master's degree in Machine Learning/Data Science. Multi-model agents. Experience with text-to-image, image-to-text, speech-to-text. Published papers in Machine Learning/Data Science journals. Medical background project.
Bonus Skills
Experience with healthcare data and medical coding systems (e.g., CPT, CM, PCS). Familiarity with regulatory and compliance frameworks in AI deployment.
#J-18808-Ljbffr
Qualifications
PhD or Masters degree in Computer Science, Machine Learning, or related field. 8+ years of experience in applied AIML, with a strong track record of delivering production-grade models. LLM training and fine-tuning (e.g., GPT, LLaMA, Mistral, Qwen). Graph-based retrieval systems (GraphRAG, knowledge graphs). Embedding models (e.g., BGE, E5, SimCSE). Semantic search and vector databases (e.g., FAISS, Weaviate, Milvus). Document segmentation and preprocessing (OCR, layout parsing). Distributed training frameworks (NCCL, Horovod, DeepSpeed). High-performance networking (InfiniBand, RDMA). Model fusion and ensemble techniques (stacking, boosting, gating). Optimization algorithms (Bayesian, Particle Swarm, Genetic Algorithms). Symbolic AI and rule-based systems. Meta-learning and Mixture of Experts architectures. Reinforcement learning (e.g., RLHF, PPO, DPO).
Must Have
PhD or Master's degree in Machine Learning/Data Science. Multi-model agents. Experience with text-to-image, image-to-text, speech-to-text. Published papers in Machine Learning/Data Science journals. Medical background project.
Bonus Skills
Experience with healthcare data and medical coding systems (e.g., CPT, CM, PCS). Familiarity with regulatory and compliance frameworks in AI deployment.
#J-18808-Ljbffr