Logo
ZipRecruiter

Infrastructure Engineer

ZipRecruiter, Houston, Texas, United States, 77246

Save Job

Job DescriptionJob Description

Infrastructure Engineer

Division:

DATUM, Impac Exploration Services Location:

Oklahoma City (OK), Houston (TX) Type:

Full-Time

Build Infrastructure for the Next of Industrial AI

We're looking for an infrastructure engineer who gets excited about making AI work in the real world—not just in pristine data centers.

You'll architect and build infrastructure that bridges the gap between cutting-edge ML research and production deployments. This isn't your typical DevOps role—you'll be creating novel architectures and solving challenges that sit at the intersection of high-performance computing, distributed systems, and industrial operations.

The Real Environment

You'll be designing and building from first principles, iterating rapidly based on what our researchers need and what reality demands. If you thrive when given a complex problem and the freedom to solve it your way, you'll love this.

We move fast. Ship fast. Learn fast. Your architecture sketch from Monday might be in production by Friday.

What You'll Own

Novel infrastructure architectures that don't exist elsewhere

Systems design from whiteboard to production deployment

Platform decisions that shape how we scale

Infrastructure that makes our data scientists dangerously productive

The technical foundation for AI that works where others can't

Building the playbook others will eventually copy

Technical Stack & Expertise

Hardware/Compute:

NVIDIA GPUs (A100, H100, A6000) and their quirks

GPU interconnects (NVLink, InfiniBand)

Server platforms (Dell PowerEdge, HPE Apollo, Supermicro)

Understanding of CUDA, memory hierarchies, and GPU optimization

Orchestration & Containers:

Kubernetes in anger (not just tutorials)

Container runtimes (Docker, containerd, CRI-O)

Service mesh (Istio, Linkerd)

Helm, Kustomize, or similar for deployment management

Infrastructure & Networking:

Terraform, Ansible, or Pulumi for IaC

BGP, VXLAN, and software-defined networking

Load balancing at layer 4 and 7

Storage solutions (Ceph, MinIO, NetApp)

ML Infrastructure:

Kubeflow, MLflow, or similar ML platforms

GPU scheduling (NVIDIA GPU Operator, MIG)

Distributed training frameworks

Model serving infrastructure (Triton, TorchServe)

You're Our Person If

You see undefined requirements as creative freedom

You've built infrastructure without Stack Overflow because no one's solved it yet

"It's never been done" sounds like a challenge, not a warning

You can move from architecture diagrams to kubectl commands

Complex distributed systems are your canvas

You can explain your choices without defaulting to "best practices"

Especially If

You've built GPU clusters that actually stayed up

You've created systems that surprised even you with what they could do

You understand when to build vs. buy vs. fork

You've made infrastructure decisions with incomplete information—and been right

You can prototype in the morning and production-harden in the afternoon

You've worked where "good enough" isn't

The Opportunity

This is a chance to build without bureaucracy. You'll:

Define architectures that become the standard for industrial AI

Work directly with ML researchers who push your systems to their limits

Make decisions that would take months of committees elsewhere

Build infrastructure that enables entirely new capabilities

Create systems that work where cloud providers fear to tread

Why This Hits Different

No legacy systems to maintain or migrate

Budget to build right, not just cheap

Direct line from your ideas to production

Team that understands infrastructure enables everything else

Problems that haven't been solved before

Freedom to define how industrial AI infrastructure should work

Ready?

Show us infrastructure you've built that others said was impossible. Tell us about a time you threw out the playbook and built something better. Share your thoughts on where ML infrastructure is heading.

We're looking for builders who see constraints as design inspiration, not limitations.

We are not currently sponsoring visas or participating in CPT programs.