Logo
Cirrascale Cloud Services

Network Operations Technician II

Cirrascale Cloud Services, Austin, Texas, us, 78716

Save Job

Network Operations Technician II Location:

Austin, TX

Job Overview As a Network Operations Technician II at Cirrascale Cloud Services, you will play a key role in maintaining the integrity, performance, and uptime of our GPU‑based data centers. This role focuses on advanced technical operations, hands‑on troubleshooting, and mentorship, supporting the NOC Manager in day‑to‑day operations and team development.

Responsibilities

Respond to alerts and incidents for systems, jobs, and GPU cluster failures.

Troubleshoot and repair servers, GPU clusters, and network equipment at global datacenter locations.

Lead resolution efforts for complex and critical incidents, escalating to the Supervisor as needed.

Assist customers with ticket triage and advanced troubleshooting using Jira (Atlassian).

Create and optimize procedures, runbooks, and automation scripts to support NOC efficiency.

Monitor system performance, support capacity planning, and analyze GPU cluster utilization.

Collaborate with Development Engineering to refine alerting, dashboards, and monitoring tools.

Document incidents, alerts, system updates, and configurations in alignment with NOC standards.

Serve as the sole trainer for all new NOC employees, providing structured onboarding while remaining under Supervisor guidance.

Develop and maintain a 5‑day training SOP, broken down by day, covering hands‑on practice, SOP/script reviews, shadowing, and reverse shadowing.

Focus training across all NOC roles (I, II, III) to ensure readiness.

Evaluate new hires and sign off at the end of the training week, reporting outcomes to the Supervisor.

Standardize training to ensure consistency, freeing other team members from ad‑hoc onboarding tasks.

Support ongoing mentorship and coaching under the direction of the Supervisor.

Work closely with the Supervisor and NOC Manager to execute operational priorities and maintain team workflow.

Participate in shift handovers and on‑call rotations as needed, escalating issues to the Supervisor when appropriate.

Support process improvements, SOP updates, and documentation initiatives driven by the Supervisor or NOC Manager.

Qualifications

4–6 years in HPC, AI infrastructure, cloud systems, or related environments.

Strong scripting skills (Python, Bash, or similar), with GPU monitoring experience.

Advanced troubleshooting experience in HPC datacenter networking and GPU clusters.

Excellent analytical, problem‑solving, and organizational skills.

Strong written and verbal communication skills; customer‑facing experience is critical.

Certifications: Ubuntu, Advanced Linux, Kubernetes (CKA/CKAD), Docker, or AI/ML certifications preferred.

Experience with SuperMicro and Lenovo servers strongly recommended.

Familiarity with Jira ticketing, Microsoft 365 Suite, Slack, and Microsoft Teams.

Understanding of RMAs, logistics, shipping, and receiving a plus.

Key Notes on Role Alignment The NOC II role is a technical and mentorship role, not a leadership or managerial role.

All training, onboarding, and escalation responsibilities are performed under the guidance and oversight of the Supervisor, ensuring alignment with broader NOC operations.

Supports both the Supervisor and NOC Manager in operational continuity, incident response, and process improvement initiatives.

Schedule Monday through Sunday. NOC is a 365‑days, 24/7 operations environment.

#J-18808-Ljbffr