Senior Software Development Engineer in Test
NVIDIA - Santa Clara
Work at NVIDIA
Overview
- View job
Overview
We are seeking a highly skilled and dedicated Senior Test Developer / Test Engineer to join our Enterprise Software QA team. This role offers an excellent opportunity to influence the design, construction, optimization, and testing of large-scale infrastructure for NVIDIA's cloud services and data center offerings.
Responsibilities:
- Collaborate with development teams on test plans covering all layers of the software stack for cloud infrastructure, including execution, reviews, failure analysis, and risk assessment. Communicate with customer PMs regarding software issues and provide technical feedback from OEMs and CSPs. Develop KPIs to monitor execution and implement process improvements.
- Lead NVIDIA Cloud and Data Center bring-up activities, including validation, reporting, debugging, design input, and coverage enhancement.
- Design, develop, and maintain CI/CD pipelines for continuous testing in cloud environments.
- Conduct performance, scalability, and reliability testing of cloud services.
- Implement and maintain test environments on cloud platforms such as AWS, Azure, and Google Cloud.
- Supervise infrastructure to alert on significant events, ensuring optimal system performance and reliability.
- Coordinate with partner teams to ensure cluster availability for testing and lead issue resolution.
- Collaborate with teams to ensure the quality of cloud products, focusing on security, storage, workloads, and performance with the latest software and firmware components.
Qualifications:
- A Master's or Ph.D. in Computer Science or a related field, or equivalent experience.
- Experience with AI development tools for creating, automating, and triaging test cases.
- At least 4 years of hands-on experience with cluster management tools such as Docker, Slurm, Kubernetes, and Ansible.
- Minimum of 2 years of experience with cloud platforms like AWS, Azure, Google Cloud, or OCI.
- Proficiency in Unix/Linux, shell scripting, and Python programming.
- Experience with network, storage, security, cluster configuration, and debugging; familiarity with cloud management tools like Terraform and Ansible.
- Expertise in Kubernetes administration and configuration.
- Experience with CI/CD tools like GitLab and Jenkins, and the GitOps model.
- Knowledge of monitoring tools such as Prometheus, Grafana, CloudWatch, and Thanos.
- Ability to troubleshoot issues involving networks, DHCP, DNS, HTTP, Linux, and containers.
Preferred Skills:
- Familiarity with Bright Cluster Manager for HPC management.
- Experience automating web applications using Selenium, Playwright, etc.
NVIDIA offers a competitive salary range of $136,000 to $264,500, determined by location and experience, along with equity and benefits. We are committed to fostering diversity and an inclusive work environment. We welcome applications on an ongoing basis from candidates who are passionate about technology and innovation.
#J-18808-Ljbffr