SelectMinds
Overview
Sr Data Scientist (NLP / LLM / Generative AI) Location: Dallas, TX Benefits Competitive salary Flexible schedule Opportunity for advancement
Responsibilities
Design, build, fine-tune, and deploy LLMs, transformer-based NLP models, and GenAI solutions for both batch and real-time/streaming contexts. Own all major components of ML pipelines: data ingestion, cleaning, pre-processing (structured & unstructured), embedding, search & retrieval, prompt engineering, RAG (Retrieval-Augmented Generation). Collaborate closely with ML Engineers, MLOps, software engineering, product, compliance, legal etc., to move models from prototype to productionensuring reliability, scalability, monitoring, and maintainability. Define and implement evaluation frameworks: accuracy, bias, fairness, hallucination, consistency, latency; run UAT, stress-tests, drift detection. Optimize models and pipelines for performance, cost, and efficiency. Ensure best practices in model development: version control, repeatability, documentation, governance, and ethical AI use. Mentor more junior data scientists; help build team skills in NLP, GenAI practices, prompt engineering, fine-tuning. Identify new use cases; prototype innovations in GenAI/NLP; keep up with latest research and open source developments, decide what to adopt.
Must-Have Qualifications
10+ years of experience in data science / ML, with substantial work in NLP, LLMs, or Generative AI. Deep hands-on experience in Python, using frameworks like PyTorch, TensorFlow, HuggingFace etc. Proven track record building transformer/NLP / LLM models; experience with fine-tuning, prompt engineering. Solid experience with information retrieval / search: keyword + semantic search, embeddings, vector databases. Experience working in production / deploying models (batch and streaming), working with MLOps practices. Strong algorithmic / statistical / mathematical fundamentals. Ability to reason about model behaviour, bias, uncertainty. Good communicator: able to translate complex technical detail to business / non-technical stakeholders.
Nice to Have
Master's in Computer Science, Computational Linguistics, Statistics, Machine Learning or related field. Experience with multimodal models (vision + text) or emerging LLMs and agent-based systems. Experience with open source LLMs & toolkits; familiarity with LangChain or similar frameworks. Prior experience in regulated environments (finance, risk, legal, compliance) with strong governance, privacy requirements.
Work remote temporarily due to COVID-19. #J-18808-Ljbffr
Sr Data Scientist (NLP / LLM / Generative AI) Location: Dallas, TX Benefits Competitive salary Flexible schedule Opportunity for advancement
Responsibilities
Design, build, fine-tune, and deploy LLMs, transformer-based NLP models, and GenAI solutions for both batch and real-time/streaming contexts. Own all major components of ML pipelines: data ingestion, cleaning, pre-processing (structured & unstructured), embedding, search & retrieval, prompt engineering, RAG (Retrieval-Augmented Generation). Collaborate closely with ML Engineers, MLOps, software engineering, product, compliance, legal etc., to move models from prototype to productionensuring reliability, scalability, monitoring, and maintainability. Define and implement evaluation frameworks: accuracy, bias, fairness, hallucination, consistency, latency; run UAT, stress-tests, drift detection. Optimize models and pipelines for performance, cost, and efficiency. Ensure best practices in model development: version control, repeatability, documentation, governance, and ethical AI use. Mentor more junior data scientists; help build team skills in NLP, GenAI practices, prompt engineering, fine-tuning. Identify new use cases; prototype innovations in GenAI/NLP; keep up with latest research and open source developments, decide what to adopt.
Must-Have Qualifications
10+ years of experience in data science / ML, with substantial work in NLP, LLMs, or Generative AI. Deep hands-on experience in Python, using frameworks like PyTorch, TensorFlow, HuggingFace etc. Proven track record building transformer/NLP / LLM models; experience with fine-tuning, prompt engineering. Solid experience with information retrieval / search: keyword + semantic search, embeddings, vector databases. Experience working in production / deploying models (batch and streaming), working with MLOps practices. Strong algorithmic / statistical / mathematical fundamentals. Ability to reason about model behaviour, bias, uncertainty. Good communicator: able to translate complex technical detail to business / non-technical stakeholders.
Nice to Have
Master's in Computer Science, Computational Linguistics, Statistics, Machine Learning or related field. Experience with multimodal models (vision + text) or emerging LLMs and agent-based systems. Experience with open source LLMs & toolkits; familiarity with LangChain or similar frameworks. Prior experience in regulated environments (finance, risk, legal, compliance) with strong governance, privacy requirements.
Work remote temporarily due to COVID-19. #J-18808-Ljbffr