CloudTech Innovations

Data Scientist

CloudTech Innovations, Dallas, Texas, United States, 75215

Job Description

Job Description Job Title:

Data Scientist Machine Learning, Big Data, GenAI (810 Years Experience) Location:

Remote Employment Type:

Contract About the Role We are seeking a highly experienced

Data Scientist

with 810 years of expertise delivering

production-grade AI/ML solutions

at scale. This role requires deep technical proficiency in

Machine Learning, Big Data, Generative AI, Large Language Models (LLMs), and Retrieval-Augmented Generation (RAG)

, combined with

hands-on cloud experience

(AWS, Azure, or GCP) and

migration expertise

for modernizing data and AI platforms. The ideal candidate can lead projects end-to-end, from architecture design to deployment, while mentoring teams, optimizing for performance and cost, and ensuring alignment with business objectives. Key Responsibilities Design, develop, and deliver

end-to-end

ML/AI solutions

in

cloud-native environments

from design to deployment and monitoring. Architect and implement

Generative AI

solutions leveraging

LLMs

(e.g., GPT, LLaMA, Claude, Mistral) and

RAG pipelines

with vector search. Build and optimize

Big Data pipelines

using

Apache Spark, PySpark, and Delta Lake

integrated with cloud storage (AWS S3, Azure Data Lake, GCP Cloud Storage). Design and maintain

data lakehouse architectures

with

Databricks, Snowflake, or Delta Lake . Deploy scalable

MLOps pipelines

using

MLflow, SageMaker, Azure ML, or Vertex AI

with

Docker, Kubernetes (EKS, AKS, GKE)

, and CI/CD. Implement and manage

vector databases

(Pinecone, FAISS, Milvus, Weaviate, ChromaDB) for RAG applications. Oversee

ETL/ELT workflows

and pipeline orchestration using

Airflow, dbt, or Azure Data Factory . Migration projects

, on-prem to cloud, cross-cloud, or legacy platform upgrades (e.g., Hadoop to Databricks, Hive to Delta Lake) , ensuring data integrity and minimal downtime. Integrate

streaming data

solutions using

Apache Kafka

and real-time analytics frameworks. Conduct

feature engineering, hyperparameter tuning, and model optimization

for performance and scalability. Mentor junior data scientists and guide best practices for AI/ML development and deployment. Collaborate with product, engineering, and executive teams to align AI solutions with

business KPIs and compliance requirements . Required Skills & Experience 810 years

in

data science, machine learning, and AI/ML solution delivery . Strong hands-on expertise in

at least one major cloud platform

(

AWS, Azure, or GCP

) with proven production deployments. Proficiency in

Python, PySpark, and SQL . Proven experience with

Apache Spark, Hadoop ecosystem

, and

Big Data processing . Hands-on experience with

Generative AI

,

Hugging Face Transformers

,

LangChain

, or

LlamaIndex . Expertise in

RAG architectures

and

vector databases

(Pinecone, FAISS, Milvus, Weaviate, ChromaDB). Experience with

MLOps

workflows using

MLflow, Docker, Kubernetes

, and CI/CD tools (Jenkins, GitHub Actions, GitLab CI). Migration experience involving

AI/ML workloads

,

big data pipelines

, and

data platforms

to modern cloud-based architectures. Knowledge of

data services

(AWS S3, Redshift; Azure Synapse; GCP BigQuery) and

infrastructure-as-code

(Terraform, CloudFormation, ARM templates). Familiarity with

streaming technologies

(Kafka) and

query engines

(Hive, Presto, Trino). Strong foundation in

statistics, probability, and ML algorithms . Preferred Qualifications Experience with

knowledge graphs

and semantic search. Background in

NLP

,

transformer architectures

, and deep learning frameworks (

TensorFlow, PyTorch

). Exposure to BI tools (

Power BI, Tableau, Looker

). Domain expertise in

finance, healthcare, or e-commerce .

#J-18808-Ljbffr