Logo
TransPerfect

Data Scientist

TransPerfect, Dallas, Texas, United States, 75215

Save Job

Overview

Locations: Dallas, TX, Philadelphia, PA, or New York, NY We are looking to hire a Data Scientist with strong expertise in machine learning, speech and language processing, and multimodal systems. This role is essential to driving our product roadmap forward, particularly in building out our core machine learning systems and developing next-generation speech technologies. The ideal candidate will be capable of working independently while effectively collaborating with cross-functional teams. In addition to deep technical knowledge, we are looking for someone who is curious, experimental, and communicative. Responsibilities

Create maintainable, elegant code and high-quality data products that are modeled, well-documented, and simple to use. Build, maintain, and improve the infrastructure to extract, transform, and load data from a variety of sources using SQL, Azure, GCP and AWS technologies. Perform statistical analysis of training datasets to identify biases, quality issues, and coverage gaps. Implement automated evaluation pipelines that scale across multiple models and tasks. Create interactive dashboards and visualization tools for model performance analysis. Design and implement robust data ingestion pipelines for massive-scale text and speech corpora including automated data preprocessing and cleaning pipelines. Create data validation frameworks and monitoring systems for dataset quality. Develop sampling strategies for balanced and representative training data. Implement comprehensive experiment tracking and hyperparameter optimization frameworks. Conduct statistical analysis of training dynamics and convergence patterns. Design A/B testing frameworks for comparing different training approaches. Create automated model selection pipelines based on multiple evaluation criteria. Develop cost-benefit analyses for different training configurations. Design comprehensive benchmark suites with statistical significance testing. Develop fairness metrics and bias detection systems. Build real-time monitoring systems for model performance in production. Implement feature drift detection and data quality monitoring. Design feedback loops to capture user interactions and model effectiveness. Create automated retraining pipelines based on performance degradation signals. Develop business metrics and ROI analysis for model deployments. Required Skills and Qualifications

Programming & Software Engineering Python (Expert Level): Advanced proficiency in scientific computing stack (NumPy, Pandas, SciPy, Scikit-learn). Version Control: Git workflows, collaborative development, and code review processes. Software Engineering Practices: Testing frameworks, CI/CD pipelines, and production-quality code development. Machine Learning and Language Model Expertise Traditional Machine Learning and Deep Learning Knowledge: Proficiency in classical ML algorithms (Naive Bayes, SVM, Random Forest, etc.) and Deep Learning architectures. Understanding of Transformer Architecture: Attention mechanisms, positional encoding, and scaling laws. Training Pipeline Knowledge: Data preprocessing for large corpora, tokenization strategies, and distributed training concepts. Evaluation Frameworks: Experience with standard NLP benchmarks (GLUE, SuperGLUE, etc.) and custom evaluation design. Fine-tuning Techniques: Understanding of PEFT methods, instruction tuning, and alignment techniques. Model Deployment: Knowledge of model optimization, quantization, and serving infrastructure for large models. Additional Skills Framework Proficiency: Scikit-learn, XGBoost, PyTorch (preferred) or TensorFlow for model implementation and experimentation. MLOps Expertise: Model versioning, experiment tracking, model monitoring (MLflow, Weights & Biases), data monitoring and validation (Great Expectations, Prometheus, Grafana), and automated ML pipelines (GitHub CI/CD, Jenkins, CircleCI, GitLab etc.). Statistical Modeling: Hypothesis testing, experimental design, causal inference, and Bayesian statistics. Model Evaluation: Cross-validation strategies, bias-variance analysis, and performance metric design. Feature Engineering: Advanced techniques for text, time-series, and multimodal data. Big Data Technologies: Spark (PySpark), Hadoop ecosystem, and distributed computing frameworks (DDP, TP, FSDP). Cloud Platforms: AWS (SageMaker, S3, EMR), GCP (Vertex AI, BigQuery), or Azure ML. Database Systems: NoSQL databases (MongoDB, Elasticsearch), graph databases (Neo4j), and vector databases (Pinecone, Milvus, ChromaDB, FAISS etc.). Data Pipeline Tools: Airflow, Prefect, or similar orchestration frameworks. Company and Benefits

Where Your Career Is Going: At TransPerfect, there are a lot of growth opportunities. All departments offer career growth and development that can combine your skills, interest and experience. We encourage our employees to have a continuous dialogue with management about growth opportunities throughout your tenure with the company. End your job search and find your career at TransPerfect #careersNOTjobs. Why TransPerfect: For more than 25 years, we have honed a culture where all kinds of ideas are shared and new ventures are not only welcomed, but also encouraged. In this fast-paced environment, employees are intellectually stimulated so they can grow alongside the organization. From Intern to President, we believe that every single employee should have a voice and contribute to the amazing services we offer our clients. We also offer a comprehensive benefits package including medical, dental, and vision insurance, 401k matching, membership to child-care providers, and other TransPerks. You even get your birthday off because let\'s face it, we\'re stoked that you were born. TransPerfect provides equal employment opportunity to all individuals regardless of their race, color, creed, religion, gender, age, sexual orientation, national origin, disability, veteran status, or any other characteristic protected by state, federal, or local law. TransPerfect is committed to all recruitment processes and workplace free from harassment, sexual harassment & discrimination. For more information on the TransPerfect Family of Companies, please visit our website at www.transperfect.com. Senior ity level

Mid-Senior level Employment type

Full-time Job function

Information Technology Industries Translation and Localization Get notified when a new job is posted.

#J-18808-Ljbffr