Cynet systems Inc
DevOps Engineer - Remote / Telecommute
Cynet systems Inc, Atlanta, Georgia, United States, 30383
Job Description
Responsibilities:
Develop and maintain scalable inference platforms for serving LLMs optimized for NVIDIA and client GPUs.
Manage end-to-end cloud engineering projects from ideation and prototyping to deployment and operations.
Build and improve tooling and observability systems to monitor performance and system health.
Design benchmarking frameworks to test and evaluate model serving performance across models, engines, and GPU configurations.
Implement distributed inference optimization techniques, including tensor/data parallelism, KV cache optimizations, and intelligent routing.
Build cross-platform inference support for diverse model architectures.
Contribute to open-source inference engines to enhance performance and efficiency.
Collaborate closely with cloud infrastructure, AI, and DevOps teams to ensure efficient deployment and scaling.
Requirements / Must Have:
Deep experience building services in modern cloud and distributed environments (Kubernetes, Docker, CI/CD, APIs, data storage, monitoring, logging, and alerting).
Experience hosting and running inference on Large Language Models (LLMs).
Strong communication skills with the ability to write detailed technical documentation.
Hands-on experience building or using benchmarking tools for evaluating LLM inference.
Familiarity with LLM performance metrics (prefill throughput, decode throughput, TPOT, TTFT).
Experience with inference engines such as vLLM, SGLang, or Modular Max.
Familiarity with distributed inference serving frameworks (llm-d, NVIDIA Dynamo, Ray Serve, etc.).
Proficiency with client and NVIDIA GPU software such as CUDA, ROCm, AITER, NCCL, or Client.
Knowledge of distributed inference optimization techniques and GPU tuning strategies.
Skills:
Expertise in cloud infrastructure, containerization, and microservices.
Strong understanding of AI model inference and GPU acceleration.
Proficiency in Python, C++, or related programming languages.
Excellent problem-solving, analytical, and debugging skills.
Ability to collaborate in a dynamic and fast-paced environment.
Qualification and Education:
Bachelor’s or Master’s degree in Computer Science, Artificial Intelligence, Electrical Engineering, or a related field.
Experience with AI infrastructure or LLM deployment platforms is highly preferred.
#J-18808-Ljbffr
Develop and maintain scalable inference platforms for serving LLMs optimized for NVIDIA and client GPUs.
Manage end-to-end cloud engineering projects from ideation and prototyping to deployment and operations.
Build and improve tooling and observability systems to monitor performance and system health.
Design benchmarking frameworks to test and evaluate model serving performance across models, engines, and GPU configurations.
Implement distributed inference optimization techniques, including tensor/data parallelism, KV cache optimizations, and intelligent routing.
Build cross-platform inference support for diverse model architectures.
Contribute to open-source inference engines to enhance performance and efficiency.
Collaborate closely with cloud infrastructure, AI, and DevOps teams to ensure efficient deployment and scaling.
Requirements / Must Have:
Deep experience building services in modern cloud and distributed environments (Kubernetes, Docker, CI/CD, APIs, data storage, monitoring, logging, and alerting).
Experience hosting and running inference on Large Language Models (LLMs).
Strong communication skills with the ability to write detailed technical documentation.
Hands-on experience building or using benchmarking tools for evaluating LLM inference.
Familiarity with LLM performance metrics (prefill throughput, decode throughput, TPOT, TTFT).
Experience with inference engines such as vLLM, SGLang, or Modular Max.
Familiarity with distributed inference serving frameworks (llm-d, NVIDIA Dynamo, Ray Serve, etc.).
Proficiency with client and NVIDIA GPU software such as CUDA, ROCm, AITER, NCCL, or Client.
Knowledge of distributed inference optimization techniques and GPU tuning strategies.
Skills:
Expertise in cloud infrastructure, containerization, and microservices.
Strong understanding of AI model inference and GPU acceleration.
Proficiency in Python, C++, or related programming languages.
Excellent problem-solving, analytical, and debugging skills.
Ability to collaborate in a dynamic and fast-paced environment.
Qualification and Education:
Bachelor’s or Master’s degree in Computer Science, Artificial Intelligence, Electrical Engineering, or a related field.
Experience with AI infrastructure or LLM deployment platforms is highly preferred.
#J-18808-Ljbffr