Logo
Judge Group, Inc.

AI/ML Data Reliability Engineer – AIOps

Judge Group, Inc., Austin, Texas, us, 78716

Save Job

Location:

Austin, TX – 100% On-Site

Duration:

12+ Months (Contract)

Citizenship Requirement:

U.S. Citizen, Permanent Resident or EAD

Position Overview We are seeking a

AI/ML Data Reliability Engineer

to join our

AIOps team , driving the design and delivery of intelligent, automated solutions leveraging modern AI and ML technologies. This team’s mission is to explore and implement the latest advancements in

Generative AI, ML, and automation tools

to streamline manual workflows and develop innovative internal and external applications.

Required Skills & Qualifications

5+ years of experience in Python development, with strong background in AI/ML or data engineering.

Proven leadership experience guiding technical teams or projects in an AI/ML or software engineering environment.

Hands‑on experience integrating or building applications using LLMs (GPT, Claude, etc.).

Proficiency with Google Cloud Platform and related AI/ML services.

Experience with GitHub Actions, Splunk, Grafana, and MongoDB.

Strong understanding of AIOps concepts, automation frameworks, and observability tools.

Ability to balance strategic design thinking with hands‑on development and delivery.

Nice to Have

Experience with PCF (Pivotal Cloud Foundry).

Prior exposure to GenAI platform adoption or evaluation within enterprise environments.

Familiarity with data pipelines, API development, and AI model lifecycle management.

Key Responsibilities

Lead a team of 4 Data Engineers, providing technical direction, architectural guidance, and hands‑on coding support (does not require people management).

Design and build Python‑based AI/ML tools, both in‑house and integrated with enterprise platforms.

Develop applications powered by LLMs (e.g., GPT, Claude) – embedding AI models at the foundational layer through to end‑user application interfaces.

Collaborate with stakeholders to identify opportunities where AI/ML can enhance operational efficiency and enable new capabilities.

Integrate AI solutions with existing infrastructure using Google Cloud Platform as the primary environment.

Leverage tools such as Splunk, Grafana, and GitHub Actions for observability, automation, and continuous integration.

Partner with cross‑functional teams to adopt or extend external AI platforms and develop creative solutions tailored to organizational needs.

Participate in end‑to‑end delivery of GenAI and ML applications – from experimentation to production deployment.

#J-18808-Ljbffr