VeeAR Projects Inc.
Get AI-powered advice on this job and more exclusive features.
Design and develop high-performance AI frameworks for large-scale distributed computation
Optimize scalability and efficiency using Nvidia Dynamo Framework
Work with distributed dataflow programming to orchestrate GPU workloads using Python and Kubernetes
Integrate advanced LLMs into real-world applications, shaping the future of AI-driven software
Contribute to building test-automation infrastructure for Kubernetes on large-scale GPU clusters.
Help develop detailed test plans for different milestones and operationalize them in test-automation infrastructure.
Own and conduct end‑end system, scale and stress testing.
Working together with SW leads and Technical Program Manager, qualify the releases.
Attract and help build downstream production engineering talent.
Role model and foster a culture of humility and innovation for product delivery.
Experience
3–8+ years of experience in software engineering, ideally at a staff level
Strong expertise in distributed dataflow programming and distributed systems
Hands‑on experience with LLMs and AI frameworks
Proficiency in Python, with experience orchestrating GPU workloads
Experience with Kubernetes for containerized application deployment and orchestration
Experience working in systems & systems SW, Cloud and Kubernetes.
Experience with production‑testing and automation of Kubernetes deployments.
Preferred Qualifications
Master's or similar qualification in a relevant field.
Experience with scalable test and automation infrastructure to productionize workloads.
Experience with GPU platforms (e.g., Nvidia DGX, H100) and high‑performance computing environments.
Experience triaging customer bugs, prioritizing, and resolving issues in production.
Familiarity with AI developer frameworks, tools, and automation systems
#J-18808-Ljbffr
Design and develop high-performance AI frameworks for large-scale distributed computation
Optimize scalability and efficiency using Nvidia Dynamo Framework
Work with distributed dataflow programming to orchestrate GPU workloads using Python and Kubernetes
Integrate advanced LLMs into real-world applications, shaping the future of AI-driven software
Contribute to building test-automation infrastructure for Kubernetes on large-scale GPU clusters.
Help develop detailed test plans for different milestones and operationalize them in test-automation infrastructure.
Own and conduct end‑end system, scale and stress testing.
Working together with SW leads and Technical Program Manager, qualify the releases.
Attract and help build downstream production engineering talent.
Role model and foster a culture of humility and innovation for product delivery.
Experience
3–8+ years of experience in software engineering, ideally at a staff level
Strong expertise in distributed dataflow programming and distributed systems
Hands‑on experience with LLMs and AI frameworks
Proficiency in Python, with experience orchestrating GPU workloads
Experience with Kubernetes for containerized application deployment and orchestration
Experience working in systems & systems SW, Cloud and Kubernetes.
Experience with production‑testing and automation of Kubernetes deployments.
Preferred Qualifications
Master's or similar qualification in a relevant field.
Experience with scalable test and automation infrastructure to productionize workloads.
Experience with GPU platforms (e.g., Nvidia DGX, H100) and high‑performance computing environments.
Experience triaging customer bugs, prioritizing, and resolving issues in production.
Familiarity with AI developer frameworks, tools, and automation systems
#J-18808-Ljbffr