CloudTech Innovations
Job Description
Job Description Job Title:
Data Scientist Machine Learning, Big Data, GenAI (810 Years Experience) Location:
Remote Employment Type:
Contract About the Role We are seeking a highly experienced
Data Scientist
with 810 years of expertise delivering
production-grade AI/ML solutions
at scale. This role requires deep technical proficiency in
Machine Learning, Big Data, Generative AI, Large Language Models (LLMs), and Retrieval-Augmented Generation (RAG)
, combined with
hands-on cloud experience
(AWS, Azure, or GCP) and
migration expertise
for modernizing data and AI platforms. The ideal candidate can lead projects end-to-end, from architecture design to deployment, while mentoring teams, optimizing for performance and cost, and ensuring alignment with business objectives. Key Responsibilities Design, develop, and deliver
end-to-end
ML/AI solutions
in
cloud-native environments
from design to deployment and monitoring. Architect and implement
Generative AI
solutions leveraging
LLMs
(e.g., GPT, LLaMA, Claude, Mistral) and
RAG pipelines
with vector search. Build and optimize
Big Data pipelines
using
Apache Spark, PySpark, and Delta Lake
integrated with cloud storage (AWS S3, Azure Data Lake, GCP Cloud Storage). Design and maintain
data lakehouse architectures
with
Databricks, Snowflake, or Delta Lake . Deploy scalable
MLOps pipelines
using
MLflow, SageMaker, Azure ML, or Vertex AI
with
Docker, Kubernetes (EKS, AKS, GKE)
, and CI/CD. Implement and manage
vector databases
(Pinecone, FAISS, Milvus, Weaviate, ChromaDB) for RAG applications. Oversee
ETL/ELT workflows
and pipeline orchestration using
Airflow, dbt, or Azure Data Factory . Migration projects
, on-prem to cloud, cross-cloud, or legacy platform upgrades (e.g., Hadoop to Databricks, Hive to Delta Lake) , ensuring data integrity and minimal downtime. Integrate
streaming data
solutions using
Apache Kafka
and real-time analytics frameworks. Conduct
feature engineering, hyperparameter tuning, and model optimization
for performance and scalability. Mentor junior data scientists and guide best practices for AI/ML development and deployment. Collaborate with product, engineering, and executive teams to align AI solutions with
business KPIs and compliance requirements . Required Skills & Experience 810 years
in
data science, machine learning, and AI/ML solution delivery . Strong hands-on expertise in
at least one major cloud platform
(
AWS, Azure, or GCP
) with proven production deployments. Proficiency in
Python, PySpark, and SQL . Proven experience with
Apache Spark, Hadoop ecosystem
, and
Big Data processing . Hands-on experience with
Generative AI
,
Hugging Face Transformers
,
LangChain
, or
LlamaIndex . Expertise in
RAG architectures
and
vector databases
(Pinecone, FAISS, Milvus, Weaviate, ChromaDB). Experience with
MLOps
workflows using
MLflow, Docker, Kubernetes
, and CI/CD tools (Jenkins, GitHub Actions, GitLab CI). Migration experience involving
AI/ML workloads
,
big data pipelines
, and
data platforms
to modern cloud-based architectures. Knowledge of
data services
(AWS S3, Redshift; Azure Synapse; GCP BigQuery) and
infrastructure-as-code
(Terraform, CloudFormation, ARM templates). Familiarity with
streaming technologies
(Kafka) and
query engines
(Hive, Presto, Trino). Strong foundation in
statistics, probability, and ML algorithms . Preferred Qualifications Experience with
knowledge graphs
and semantic search. Background in
NLP
,
transformer architectures
, and deep learning frameworks (
TensorFlow, PyTorch
). Exposure to BI tools (
Power BI, Tableau, Looker
). Domain expertise in
finance, healthcare, or e-commerce .
#J-18808-Ljbffr
Job Description Job Title:
Data Scientist Machine Learning, Big Data, GenAI (810 Years Experience) Location:
Remote Employment Type:
Contract About the Role We are seeking a highly experienced
Data Scientist
with 810 years of expertise delivering
production-grade AI/ML solutions
at scale. This role requires deep technical proficiency in
Machine Learning, Big Data, Generative AI, Large Language Models (LLMs), and Retrieval-Augmented Generation (RAG)
, combined with
hands-on cloud experience
(AWS, Azure, or GCP) and
migration expertise
for modernizing data and AI platforms. The ideal candidate can lead projects end-to-end, from architecture design to deployment, while mentoring teams, optimizing for performance and cost, and ensuring alignment with business objectives. Key Responsibilities Design, develop, and deliver
end-to-end
ML/AI solutions
in
cloud-native environments
from design to deployment and monitoring. Architect and implement
Generative AI
solutions leveraging
LLMs
(e.g., GPT, LLaMA, Claude, Mistral) and
RAG pipelines
with vector search. Build and optimize
Big Data pipelines
using
Apache Spark, PySpark, and Delta Lake
integrated with cloud storage (AWS S3, Azure Data Lake, GCP Cloud Storage). Design and maintain
data lakehouse architectures
with
Databricks, Snowflake, or Delta Lake . Deploy scalable
MLOps pipelines
using
MLflow, SageMaker, Azure ML, or Vertex AI
with
Docker, Kubernetes (EKS, AKS, GKE)
, and CI/CD. Implement and manage
vector databases
(Pinecone, FAISS, Milvus, Weaviate, ChromaDB) for RAG applications. Oversee
ETL/ELT workflows
and pipeline orchestration using
Airflow, dbt, or Azure Data Factory . Migration projects
, on-prem to cloud, cross-cloud, or legacy platform upgrades (e.g., Hadoop to Databricks, Hive to Delta Lake) , ensuring data integrity and minimal downtime. Integrate
streaming data
solutions using
Apache Kafka
and real-time analytics frameworks. Conduct
feature engineering, hyperparameter tuning, and model optimization
for performance and scalability. Mentor junior data scientists and guide best practices for AI/ML development and deployment. Collaborate with product, engineering, and executive teams to align AI solutions with
business KPIs and compliance requirements . Required Skills & Experience 810 years
in
data science, machine learning, and AI/ML solution delivery . Strong hands-on expertise in
at least one major cloud platform
(
AWS, Azure, or GCP
) with proven production deployments. Proficiency in
Python, PySpark, and SQL . Proven experience with
Apache Spark, Hadoop ecosystem
, and
Big Data processing . Hands-on experience with
Generative AI
,
Hugging Face Transformers
,
LangChain
, or
LlamaIndex . Expertise in
RAG architectures
and
vector databases
(Pinecone, FAISS, Milvus, Weaviate, ChromaDB). Experience with
MLOps
workflows using
MLflow, Docker, Kubernetes
, and CI/CD tools (Jenkins, GitHub Actions, GitLab CI). Migration experience involving
AI/ML workloads
,
big data pipelines
, and
data platforms
to modern cloud-based architectures. Knowledge of
data services
(AWS S3, Redshift; Azure Synapse; GCP BigQuery) and
infrastructure-as-code
(Terraform, CloudFormation, ARM templates). Familiarity with
streaming technologies
(Kafka) and
query engines
(Hive, Presto, Trino). Strong foundation in
statistics, probability, and ML algorithms . Preferred Qualifications Experience with
knowledge graphs
and semantic search. Background in
NLP
,
transformer architectures
, and deep learning frameworks (
TensorFlow, PyTorch
). Exposure to BI tools (
Power BI, Tableau, Looker
). Domain expertise in
finance, healthcare, or e-commerce .
#J-18808-Ljbffr