Logo
Cirrascale Cloud Services

Network Operations Center Technician

Cirrascale Cloud Services, Austin, Texas, us, 78716

Save Job

Network Operations Technician – Hiring for Multiple Levels About Cirrascale Cirrascale Cloud Services provides high-performance cloud infrastructure purpose-built for deep learning, generative AI, and large-scale AI inference workloads. We specialize in dedicated GPU cloud solutions tailored to the unique needs of startups, research labs, and enterprise AI teams. Our mission is to accelerate AI innovation by combining powerful hardware with white-glove service and flexible, custom-built environments.

Position Overview As a Network Operations Technician at Cirrascale Cloud Services, you will play a key role in maintaining the integrity, performance, and uptime of our GPU-based data centers. This role focuses on advanced technical operations, hands‑on troubleshooting, and mentorship, supporting both the Supervisor of NOC and the NOC Manager in day‑to‑day operations and team development.

You will be responsible for advanced monitoring, incident resolution, and supporting a 24/7 NOC environment. In addition, you will serve as the primary trainer for new NOC employees, ensuring consistent onboarding and hands‑on skills development while supporting the Supervisor in maintaining operational excellence.

Key Responsibilities Advanced Technical Operations & Incident Management

Respond to alerts and incidents for systems, jobs, and GPU cluster failures.

Troubleshoot and repair servers, GPU clusters, and network equipment at global datacenter locations.

Collaborate with NOC I and NOC II to resolve tickets.

Lead resolution efforts for complex and critical incidents and upgrades, escalating to the Supervisor as needed.

Assist customers with ticket triage and advanced troubleshooting using Jira (Atlassian).

Create, optimize, and maintain procedures, runbooks, and automation scripts to support NOC efficiency.

Help NOC supervisor tune and maintain customer dashboards and runbooks.

Monitor system performance, support capacity planning, and analyze GPU cluster utilization.

Collaborate with Development Engineering to refine alerting and monitoring tools.

Document incidents, alerts, system updates, and configurations in alignment with NOC standards.

Serve as the sole trainer for all new NOC employees, providing structured onboarding while remaining under Supervisor guidance.

Develop and maintain a 5‑day training SOP, broken down by day, covering hands‑on practice, SOP/script reviews, shadowing, and reverse shadowing.

Focus training across all NOC roles (I, II, III) to ensure readiness.

Evaluate new hires and sign off at the end of the training week, reporting outcomes to the Supervisor.

Standardize training to ensure consistency, freeing other team members from ad‑hoc onboarding tasks.

Support ongoing mentorship and coaching under the direction of the Supervisor.

Collaboration & Support (for NOC Technician III role only)

Work closely with the Supervisor and NOC Manager to execute operational priorities and maintain team workflow.

Participate in shift handovers and on‑call rotations as needed, escalating issues to the Supervisor when appropriate.

Support process improvements, SOP updates, and documentation initiatives driven by the Supervisor or NOC Manager.

Qualifications

4–6 years of experience in NOC, HPC (high‑performance computing), AI infrastructure, cloud systems, or related.

Mentorship & Training Leadership: Serve as a key mentor for the team by training and coaching NOC Technician I & II staff, providing day‑to‑day guidance, knowledge transfer, and performance support; requires prior experience mentoring and training junior or lower‑level peers in a technical operations environment. (for NOC Technician III only)

Strong scripting skills (Python, Bash, or similar), with GPU monitoring experience.

Remote & hands‑on experience with Linux Ubuntu 22.04 and 24.04 is preferred.

Advanced troubleshooting experience in HPC datacenter networking and GPU clusters.

Excellent analytical, problem‑solving, and organizational skills.

Strong written and verbal communication skills; customer‑facing experience is critical.

Certifications: Advanced Linux, Kubernetes (CKA/CKAD), Docker, or AI/ML certifications preferred.

Experience with SuperMicro, Lenovo, and Dell servers strongly recommended.

Familiarity with Jira ticketing, Microsoft 365 Suite, Slack, and Microsoft Teams.

Understanding of RMAs, logistics, shipping, and receiving is a plus.

Key Notes on Role Alignment

The NOC III role is a technical and mentorship role, not a leadership or managerial role.

All training, onboarding, and escalation responsibilities are performed under the guidance and oversight of the Supervisor, ensuring alignment with broader NOC operations.

Supports both the Supervisor and NOC Manager in operational continuity, incident response, and process improvement initiatives.

Schedule

NOC Schedule & Shift Flexibility:

This role supports a Sunday – Saturday Network Operations Center (NOC) schedule, and candidates must be available to work any shift (e.g., days, evenings, overnight) based on operational needs.

NOC is a 365‑days, 24/7 Operations.

Work Location: Austin, TX

Benefits

401(k) with company match.

Health, dental, and vision insurance.

Paid time off (PTO).

Opportunities for professional development and growth.

Why Join Cirrascale? Join a growing team that’s pushing the boundaries of AI infrastructure. At Cirrascale, you’ll contribute to projects powering next‑generation AI applications while working with top‑tier hardware in a collaborative and innovative environment. From custom deployments to hands‑on customer support, every role here plays a part in enabling breakthroughs in AI.

Cirrascale Cloud Services is an equal‑opportunity employer committed to diversity and inclusion.

Please apply at careers@cirrascale.com

#J-18808-Ljbffr