Gravity IT Resources

Machine Learning Engineer

Gravity IT Resources, Nashville, Tennessee, United States, 37247

Machine Learning Engineer Employment Type:

Full-Time Location:

Nashville, TN (hybrid)

About the Role Were hiring a

Maching Learning Engineer

to design and deploy AI systems end-to-end

from data preparation and evaluation to model fine-tuning, inference, and agentic workflows. Youll work closely with product and engineering teams to deliver reliable, cost-effective, and scalable LLM-powered solutions on AWS.

What Youll Do End-to-End GenAI Solutions:

Scope problems, choose the right approach (prompt engineering, fine-tuning, agents), implement, evaluate, and deploy. Data & SQL:

Write efficient SQL for analytics and data prep; manage schemas and pipelines for model training and inference. Model Training & Fine-Tuning:

Run supervised fine-tuning (PEFT/LoRA/QLoRA), optimize prompts, and manage experiment tracking/evaluation. Agentic Systems:

Build agent workflows with tool use, memory, and safety/guardrails. Inference & Deployment:

Package services with Docker, optimize latency and cost (batching, caching, quantization), and deploy on AWS (ECS, EKS, SageMaker, Lambda with GPU acceleration). MLOps & Observability:

Set up CI/CD for models/prompts; maintain offline/online evaluation pipelines, monitoring, and rollback strategies. Security & Compliance:

Implement data governance, PHI/PII protections, and guardrails against prompt injection and unsafe outputs. Cross-Functional Collaboration:

Work with product managers and engineers to align GenAI capabilities with product goals; clearly document and communicate trade-offs. Production Readiness:

Lead conversations around scaling, monitoring, and maintaining GenAI systems in production environments.

Minimum Qualifications 5+ years of Software/ML engineering experience, including 2+ years building and deploying GenAI/LLM systems. MS/PhD in Computer Science, Data Science, or equivalent experience. Strong SQL and Python skills with solid software engineering fundamentals. Experience with agent frameworks (LangGraph, AutoGen, CrewAI) and tool-driven agents. Hands-on with deep learning (PyTorch or TensorFlow) and LLM fine-tuning (SFT/PEFT like LoRA/QLoRA). Production experience with Docker and AWS (ECS, EKS, SageMaker, Lambda, or GPU services). Experience building scalable data and model pipelines for training and deployment. Familiarity with prompt engineering, evaluation frameworks (LLM-as-judge, metrics), and offline test harnesses. Understanding of security & compliance for sensitive data (e.g., PHI/PII). Excellent problem-solving, communication, and documentation skills.

Preferred Qualifications Experience with inference optimization: quantization (bitsandbytes, GPTQ/AWQ), batching, caching, or vLLM. Background in healthcare, including HIPAA compliance or medical data handling. Experience with experiment tracking (MLflow, W&B), CI/CD for ML, and monitoring tools (Prometheus, Grafana). Familiarity with major LLM APIs and open-source models (OpenAI, Anthropic, Llama, Mistral).

Tech Stack Languages:

Python, SQL DL/LLM:

PyTorch, TensorFlow, Hugging Face, PEFT/TRL, vLLM Data:

Snowflake, Postgres Cloud:

AWS (ECS, EKS, SageMaker, Lambda) MLOps:

Docker, CI/CD, MLflow, or W&B