Diversified Services Network, Inc.
Senior Server Administrator
Diversified Services Network, Inc., Dallas, Texas, United States, 75215
Position
Sr. Server Administrator
Job Locations Dallas, TX; Peoria, IL; Phoenix, AZ; Broomfield, CO; Cary, NC
Employment Type Full-time, W2 ONLY – Absolutely NO C2C (will NOT respond to vendors).
Salary $67.00/hr - $72.00/hr (base pay)
Responsibilities
Administer and maintain GPU‑accelerated servers and clusters, including NVIDIA A100, H100, and other high‑end GPU sets.
Manage and optimize NVIDIA software stack components such as CUDA, cuDNN, TensorRT, NCCL, and NGC containers.
Monitor system performance, troubleshoot hardware/software issues, and ensure high availability of AI infrastructure.
Collaborate with DevOps and AI teams to support containerized workflows (Docker, Kubernetes) and distributed training environments.
Implement security best practices and ensure compliance with internal and external standards.
Lead upgrades, patching, and lifecycle management of GPU servers and related infrastructure.
Provide documentation, automation scripts, and training for internal teams.
Education
Bachelor’s Degree with a minimum of 8 years’ work experience, 5+ years of experience in server administration, with at least 3 years focused on NVIDIA GPU‑based systems.
Required Skills
5+ years of experience in server administration, with at least 3 years focused on NVIDIA GPU‑based systems.
Deep understanding of Linux system administration, especially in HPC or AI environments.
Hands‑on experience with NVIDIA GPU drivers, CUDA toolkit, and performance tuning.
Familiarity with Slurm, Kubernetes, or other job scheduling and orchestration tools.
Experience with monitoring tools (e.g., Prometheus, Grafana) and infrastructure automation (e.g., Ansible, Terraform).
Excellent problem‑solving and communication skills.
Desired Skills
NVIDIA Certified Professional or similar credentials (desired).
Experience with multi‑GPU and multi‑node training setups.
Familiarity with AI/ML frameworks (e.g., PyTorch, TensorFlow) and their GPU dependencies.
Exposure to cloud‑based GPU infrastructure (AWS, Azure, GCP).
Benefits
401(k)
Vision Insurance
Disability insurance
Employee assistance program
Health insurance
Health savings account
Life insurance
Paid time off
Paid Holidays
#J-18808-Ljbffr
Job Locations Dallas, TX; Peoria, IL; Phoenix, AZ; Broomfield, CO; Cary, NC
Employment Type Full-time, W2 ONLY – Absolutely NO C2C (will NOT respond to vendors).
Salary $67.00/hr - $72.00/hr (base pay)
Responsibilities
Administer and maintain GPU‑accelerated servers and clusters, including NVIDIA A100, H100, and other high‑end GPU sets.
Manage and optimize NVIDIA software stack components such as CUDA, cuDNN, TensorRT, NCCL, and NGC containers.
Monitor system performance, troubleshoot hardware/software issues, and ensure high availability of AI infrastructure.
Collaborate with DevOps and AI teams to support containerized workflows (Docker, Kubernetes) and distributed training environments.
Implement security best practices and ensure compliance with internal and external standards.
Lead upgrades, patching, and lifecycle management of GPU servers and related infrastructure.
Provide documentation, automation scripts, and training for internal teams.
Education
Bachelor’s Degree with a minimum of 8 years’ work experience, 5+ years of experience in server administration, with at least 3 years focused on NVIDIA GPU‑based systems.
Required Skills
5+ years of experience in server administration, with at least 3 years focused on NVIDIA GPU‑based systems.
Deep understanding of Linux system administration, especially in HPC or AI environments.
Hands‑on experience with NVIDIA GPU drivers, CUDA toolkit, and performance tuning.
Familiarity with Slurm, Kubernetes, or other job scheduling and orchestration tools.
Experience with monitoring tools (e.g., Prometheus, Grafana) and infrastructure automation (e.g., Ansible, Terraform).
Excellent problem‑solving and communication skills.
Desired Skills
NVIDIA Certified Professional or similar credentials (desired).
Experience with multi‑GPU and multi‑node training setups.
Familiarity with AI/ML frameworks (e.g., PyTorch, TensorFlow) and their GPU dependencies.
Exposure to cloud‑based GPU infrastructure (AWS, Azure, GCP).
Benefits
401(k)
Vision Insurance
Disability insurance
Employee assistance program
Health insurance
Health savings account
Life insurance
Paid time off
Paid Holidays
#J-18808-Ljbffr