asobbi
Solutions Architect (AI Infrastructure)
asobbi, San Francisco, California, United States, 94199
Base pay range
$160,000.00/yr - $200,000.00/yr
Direct message the job poster from asobbi
HPC, AI/ML, Tech Scale-Ups, PE & VC - Europe & USA Job Title: Solutions Architect (GPU Cloud & AI Inference)
Location:
New York / San Fransico
Employment:
Full-time
Overview Our client is a fast‑scaling organisation operating at the intersection of GPU Cloud infrastructure, high‑performance computing, and AI inference platforms. They provide next‑generation compute solutions enabling enterprise customers to deploy and scale sophisticated AI and ML workloads globally.
We are supporting them in the search for a
Solutions Architect
who brings deep technical credibility across GPU‑accelerated systems, distributed compute, and modern cloud‑native architectures. This individual will play a customer‑facing role while also offering internal technical leadership as the company continues its rapid expansion.
Key Responsibilities
Serve as the primary technical interface for enterprise customers evaluating GPU cloud and AI inference solutions.
Translate customer workloads (LLM inference, training, large‑scale compute) into robust architectural proposals.
Lead discovery workshops, technical scoping, and solution design across GPU clusters, high‑throughput networking, storage, and orchestration layers.
Create reference architectures, solution briefs, and best‑practice deployment patterns.
AI & GPU Infrastructure Expertise
Design performant and cost‑optimised architectures for AI inference, model deployment, parallel compute, and containerised workloads.
Provide guidance on GPU scheduling, inference optimisation, model serving platforms, and accelerator utilisation.
Work closely with product and engineering teams to validate customer requirements and shape the roadmap.
Leadership & Cross‑Functional Influence
Act as a senior technical leader within the customer engineering function, mentoring junior engineers and driving architectural standards.
Provide strategic insight to commercial teams on customer requirements, competitive, and market trends.
Represent the company at industry events, technical briefings, and partner engagements where necessary.
Implementation & Delivery Oversight
Support customer onboarding and migration initiatives across GPU‑based cloud infrastructure.
Offer architectural governance and quality assurance throughout the deployment lifecycle.
Ensure solutions meet standards for scalability, performance, availability, and operational resilience.
Required Experience
Proven background as a Solutions Architect, Senior Systems Engineer, or similar technical role within HPC, GPU Cloud, AI/ML infrastructure, or large‑scale distributed systems.
Strong understanding of NVIDIA GPU architectures, CUDA ecosystem, inference optimisation, MLOps workflows, and container orchestration (Kubernetes, KubeFlow, Triton, etc.).
Experience designing or deploying cloud‑native systems at scale, ideally covering compute clusters of thousands of GPUs or CPUs.
Knowledge of high‑performance networking (InfiniBand, RoCE), distributed storage (Ceph, Lustre, or similar), and IaC tooling (Terraform, Ansible).
Ability to work directly with customers and internal stakeholders to define architectures, manage expectations, and deliver outcomes.
Demonstrable leadership experience: mentoring, influencing engineering direction, or owning architectural governance.
Exposure to inference‑focused architectures, model serving frameworks, or LLM deployment patterns.
Understanding of cost modelling, workload benchmarking, and performance tuning for GPU‑accelerated environments.
Experience in a fast‑growing startup or high‑velocity cloud organisation.
Seniority level Mid‑Senior level
Employment type Full‑time
Job function Engineering and Information Technology
Industries IT Services and IT Consulting
#J-18808-Ljbffr
Direct message the job poster from asobbi
HPC, AI/ML, Tech Scale-Ups, PE & VC - Europe & USA Job Title: Solutions Architect (GPU Cloud & AI Inference)
Location:
New York / San Fransico
Employment:
Full-time
Overview Our client is a fast‑scaling organisation operating at the intersection of GPU Cloud infrastructure, high‑performance computing, and AI inference platforms. They provide next‑generation compute solutions enabling enterprise customers to deploy and scale sophisticated AI and ML workloads globally.
We are supporting them in the search for a
Solutions Architect
who brings deep technical credibility across GPU‑accelerated systems, distributed compute, and modern cloud‑native architectures. This individual will play a customer‑facing role while also offering internal technical leadership as the company continues its rapid expansion.
Key Responsibilities
Serve as the primary technical interface for enterprise customers evaluating GPU cloud and AI inference solutions.
Translate customer workloads (LLM inference, training, large‑scale compute) into robust architectural proposals.
Lead discovery workshops, technical scoping, and solution design across GPU clusters, high‑throughput networking, storage, and orchestration layers.
Create reference architectures, solution briefs, and best‑practice deployment patterns.
AI & GPU Infrastructure Expertise
Design performant and cost‑optimised architectures for AI inference, model deployment, parallel compute, and containerised workloads.
Provide guidance on GPU scheduling, inference optimisation, model serving platforms, and accelerator utilisation.
Work closely with product and engineering teams to validate customer requirements and shape the roadmap.
Leadership & Cross‑Functional Influence
Act as a senior technical leader within the customer engineering function, mentoring junior engineers and driving architectural standards.
Provide strategic insight to commercial teams on customer requirements, competitive, and market trends.
Represent the company at industry events, technical briefings, and partner engagements where necessary.
Implementation & Delivery Oversight
Support customer onboarding and migration initiatives across GPU‑based cloud infrastructure.
Offer architectural governance and quality assurance throughout the deployment lifecycle.
Ensure solutions meet standards for scalability, performance, availability, and operational resilience.
Required Experience
Proven background as a Solutions Architect, Senior Systems Engineer, or similar technical role within HPC, GPU Cloud, AI/ML infrastructure, or large‑scale distributed systems.
Strong understanding of NVIDIA GPU architectures, CUDA ecosystem, inference optimisation, MLOps workflows, and container orchestration (Kubernetes, KubeFlow, Triton, etc.).
Experience designing or deploying cloud‑native systems at scale, ideally covering compute clusters of thousands of GPUs or CPUs.
Knowledge of high‑performance networking (InfiniBand, RoCE), distributed storage (Ceph, Lustre, or similar), and IaC tooling (Terraform, Ansible).
Ability to work directly with customers and internal stakeholders to define architectures, manage expectations, and deliver outcomes.
Demonstrable leadership experience: mentoring, influencing engineering direction, or owning architectural governance.
Exposure to inference‑focused architectures, model serving frameworks, or LLM deployment patterns.
Understanding of cost modelling, workload benchmarking, and performance tuning for GPU‑accelerated environments.
Experience in a fast‑growing startup or high‑velocity cloud organisation.
Seniority level Mid‑Senior level
Employment type Full‑time
Job function Engineering and Information Technology
Industries IT Services and IT Consulting
#J-18808-Ljbffr