Diversified Services Network, Inc.

Senior Server Administrator

Diversified Services Network, Inc., Dallas, Texas, United States, 75215

Position Sr. Server Administrator

Job Locations Dallas, TX; Peoria, IL; Phoenix, AZ; Broomfield, CO; Cary, NC

Employment Type Full-time, W2 ONLY – Absolutely NO C2C (will NOT respond to vendors).

Salary $67.00/hr - $72.00/hr (base pay)

Responsibilities

Administer and maintain GPU‑accelerated servers and clusters, including NVIDIA A100, H100, and other high‑end GPU sets.

Manage and optimize NVIDIA software stack components such as CUDA, cuDNN, TensorRT, NCCL, and NGC containers.

Monitor system performance, troubleshoot hardware/software issues, and ensure high availability of AI infrastructure.

Collaborate with DevOps and AI teams to support containerized workflows (Docker, Kubernetes) and distributed training environments.

Implement security best practices and ensure compliance with internal and external standards.

Lead upgrades, patching, and lifecycle management of GPU servers and related infrastructure.

Provide documentation, automation scripts, and training for internal teams.

Education

Bachelor’s Degree with a minimum of 8 years’ work experience, 5+ years of experience in server administration, with at least 3 years focused on NVIDIA GPU‑based systems.

Required Skills

5+ years of experience in server administration, with at least 3 years focused on NVIDIA GPU‑based systems.

Deep understanding of Linux system administration, especially in HPC or AI environments.

Hands‑on experience with NVIDIA GPU drivers, CUDA toolkit, and performance tuning.

Familiarity with Slurm, Kubernetes, or other job scheduling and orchestration tools.

Experience with monitoring tools (e.g., Prometheus, Grafana) and infrastructure automation (e.g., Ansible, Terraform).

Excellent problem‑solving and communication skills.

Desired Skills

NVIDIA Certified Professional or similar credentials (desired).

Experience with multi‑GPU and multi‑node training setups.

Familiarity with AI/ML frameworks (e.g., PyTorch, TensorFlow) and their GPU dependencies.

Exposure to cloud‑based GPU infrastructure (AWS, Azure, GCP).

Benefits

401(k)

Vision Insurance

Disability insurance

Employee assistance program

Health insurance

Health savings account

Life insurance

Paid time off

Paid Holidays

#J-18808-Ljbffr