Tata Consultancy Services

Data Scientist

Tata Consultancy Services, Tampa, Florida, us, 33646

Must Have Technical/Functional Skills • Programming & Libraries: Expert-level proficiency in Python and its core data science libraries (Pandas, NumPy, Scikit-learn). Strong proficiency in SQL for complex data extraction and manipulation. • Machine Learning Frameworks: Hands-on experience with modern deep learning frameworks such as TensorFlow or PyTorch. • Statistical Modeling: Deep understanding of statistical concepts and a wide range of machine learning algorithms, with proven experience in time-series forecasting and anomaly detection. • Big Data Technologies: Demonstrable experience working with large datasets using distributed computing frameworks, specifically Apache Spark. • Database Systems: Experience querying and working with data from multiple relational database systems (e.g., PostgreSQL, Oracle, MS SQL Server). • Cloud Platforms: Experience building and deploying data science solutions on a major cloud platform (AWS, GCP, or Azure). Familiarity with their native ML services (e.g., AWS SageMaker, Google Vertex AI) is a strong plus. • MLOps Tooling: Practical experience with MLOps principles and tools for model versioning, tracking, and deployment (e.g., MLflow, Docker). • Communication and Storytelling: Excellent verbal and written communication skills, with a proven ability to explain complex technical concepts to a non-technical audience through visual storytelling.

Roles & Responsibilities • Druid Data Modeling & Schema Design: o Design and implement efficient data schemas, dimensions, and metrics within Apache Druid for various analytical use cases (e.g., clickstream, IoT, application monitoring). o Determine optimal partitioning, indexing (bitmap indexes), and rollup strategies to ensure sub-second query performance and efficient storage. • Data Ingestion Pipeline Development: o Develop and manage real-time data ingestion pipelines into Druid from streaming sources like Apache Kafka, Amazon Kinesis, or other message queues. o Implement batch data ingestion processes from data lakes (e.g., HDFS, Amazon S3, Azure Blob, Google Cloud Storage) or other databases. o Ensure data quality, consistency, and exactly-once processing during ingestion. • Query Optimization & Performance Tuning: o Write and optimize complex SQL queries (Druid SQL) for high-performance analytical workloads, including aggregations, filters, and time-series analysis. o Analyze query plans and identify performance bottlenecks, implementing solutions such as segment optimization, query rewriting, or cluster configuration adjustments. o • Programming & Libraries: Expert-level proficiency in Python and its core data science libraries (Pandas, NumPy, Scikit-learn). Strong proficiency in SQL for complex data extraction and manipulation. • Machine Learning Frameworks: Hands-on experience with modern deep learning frameworks such as TensorFlow or PyTorch. • Statistical Modeling: Deep understanding of statistical concepts and a wide range of machine learning algorithms, with proven experience in time-series forecasting and anomaly detection. • Big Data Technologies: Demonstrable experience working with large datasets using distributed computing frameworks, specifically Apache Spark. • Database Systems: Experience querying and working with data from multiple relational database systems (e.g., PostgreSQL, Oracle , MS SQL Server). • Cloud Platforms: Experience building and deploying data science solutions on a major cloud platform (AWS, GCP, or Azure). Familiarity with their native ML services (e.g., AWS SageMaker, Google Vertex AI) is a strong plus. • MLOps Tooling: Practical experience with MLOps principles and tools for model versioning, tracking, and deployment (e.g., MLflow, Docker). • Communication and Storytelling: Excellent verbal and written communication skills, with a proven ability to explain complex technical concepts to a non-technical audience through visual storytelling

Salary Range: $100,000-$120,000 a year

#LI-DM1