Logo
Fausto Commercial

Data Scientist ~ NLP & Generative AI Focus

Fausto Commercial, WorkFromHome

Save Job

Data Scientist ~ NLP & Generative AI Focus

Overview

Title: Data Scientist ~ NLP & Generative AI Focus

Location: Remote

Duration: Temporary

Compensation: will vary based on experience and project scope

Summary

Fausto Commercial is seeking a Data Scientist to lead the development of a cutting-edge, voice-activated real estate assistant. This intelligent system will use natural language processing (NLP) to interpret spoken property inquiries, match them to listings in real time, and capture caller details into a CRM for seamless lead management. Future phases will introduce predictive features that identify potential buyers or tenants for new listings based on historical inquiry patterns—streamlining prospecting and boosting conversion rates.

About the Role

We are seeking an exceptional Data Scientist with a strong background in Natural Language Processing (NLP), Machine Learning (ML), and Big Data technologies to join our fast-growing, innovation-driven team. This role involves building scalable data pipelines, developing state-of-the-art NLP models, and applying generative AI techniques to solve complex business challenges. The ideal candidate is equally passionate about deep technical work and practical applications of AI to drive real-world impact.

Responsibilities

  • Lead development of a voice-activated real estate assistant with NLP-driven query interpretation and real-time listing matching
  • Build scalable data pipelines and data lakes/warehouses to support NLP and AI workloads
  • Develop state-of-the-art NLP models and apply generative AI techniques to business problems
  • Implement retrieval-augmented generation (RAG) and semantic search using vector databases
  • Collaborate cross-functionally with product, engineering, and sales to operationalize AI solutions
  • Monitor, evaluate, and retrain models in production, ensuring quality, reliability, and compliance

Must-Have Skills

  • Bachelor's degree in Computer Science, Data Science, or a quantitatively rigorous discipline
  • 5+ years in a similar role
  • Software Engineering & Programming: Strong command of Python and SQL. Strong grasp of OOP principles (encapsulation, inheritance, abstraction, polymorphism) to build maintainable, modular codebases.
  • Big Data & Data Engineering Pipelines: Experience with building scalable ETL/ELT pipelines that extract, clean, and load large datasets into data lakes/warehouses.
  • NLP: Proficiency in traditional and neural NLP techniques. Feature engineering for text data (TF-IDF, word2vec, FastText) and transformer architectures. Experience with spaCy and NLTK for preprocessing and analysis.
  • Vector Databases & Semantic Search: Understanding of vector embeddings and their application. Experience with vector stores (Pinecone, Weaviate, Milvus) for building Retrieval-Augmented Generation (RAG) systems. Skilled in semantic search and conversational AI applications.
  • Generative AI & LLMs: Hands-on experience with state-of-the-art LLMs (e.g., Llama, Mistral, GPT-4) and generative AI models. Expertise in fine-tuning foundational models for specific tasks. Proficient with Hugging Face Transformers and LangChain for production applications.
  • Machine Learning & Model Ops: Experience training and deploying deep learning models using PyTorch, TensorFlow, and Keras. Knowledge of MLOps, model versioning, orchestration (MLflow or Kubeflow), and monitoring/retraining in production.
  • Data Access Tools: Proficient in Python and SQL for data manipulation and modeling.

Nice to Have

  • Experience with GitHub and REST APIs
  • Visualization tools like Power BI, Django, or Flask

Seniority level

  • Mid-Senior level

Employment type

  • Temporary

Job function

  • Engineering and Information Technology
#J-18808-Ljbffr