Sonia
Oct 12, 2025 - Sonia is hiring a remote ML Platform Engineer.
Salary: competitive salary depending on experience. Location: Germany, Luxembourg.
With Sonia, doctors are successful doctors. We create and deploy AI enhanced solutions that make doctors’ lives easier, patients’ care better, and healthcare systems more efficient. If you’re an intrinsically motivated self-starter who values impactful work, join us in revolutionizing healthcare.
We’re looking for an experienced
ML Platform Engineer (all)
with deep Kubernetes expertise to support the infrastructure powering our AI and ML workloads. You’ll work closely with ML engineers on everything from deploying cutting-edge LLM inference to refining observability and automating workflows—always with reliability, scalability, and performance as your guiding principles.
This role can be performed remotely from anywhere in Germany or Luxembourg, or in a hybrid setup from our offices in Luxembourg or Berlin.
What you’ll own
Support and enhance our Kubernetes-based infrastructure in cloud environments, running both ML/LLM workloads and general applications
Deploy and optimize LLM inference systems
Design, build, and improve MLOps/DevOps pipelines to support the entire development lifecycle
Manage GPU scheduling and autoscaling with Kubernetes-native tooling
Ensure observability and alerting across the platform
Operate and troubleshoot supporting infrastructure
Contribute to platform reliability, security, and performance through automation and best practices
You’ll thrive in this role if you bring
5+ years of experience in MLOps or SRE
Strong hands-on Kubernetes experience, including GitOps (Flux or ArgoCD), Kustomize, Helm and production troubleshooting
Familiarity with LLM inference deployment and optimization in Kubernetes (e.g., vLLM, LMCache, llm-d)
Experience with MLOps supporting tools such as MLflow or Argo Workflows
Understanding of GPU resource orchestration in Kubernetes environments
Profound knowledge of observability tools, such as VictoriaMetrics, VictoriaLogs and Grafana
Knowledge of database and broker administration (PostgreSQL, Redis and RabbitMQ)
Solid scripting skills in Python
Comfortable working with cloud platforms (OVHcloud, AWS, GCP or Azure)
Nice-to-Haves
Experience with audio ML models or real-time inference
Exposure to CI/CD practices tailored for ML systems
Familiarity with Kubernetes networking, security, or performance tuning
Why you’ll love working with us
Full ownership of a mission-critical platform
A team that values curiosity, learning, and experimentation
Remote-first setup with the option to work in our Berlin office
Competitive salary depending on experience
Work on AI infrastructure that directly impacts healthcare innovation
Ready to apply? #J-18808-Ljbffr
Salary: competitive salary depending on experience. Location: Germany, Luxembourg.
With Sonia, doctors are successful doctors. We create and deploy AI enhanced solutions that make doctors’ lives easier, patients’ care better, and healthcare systems more efficient. If you’re an intrinsically motivated self-starter who values impactful work, join us in revolutionizing healthcare.
We’re looking for an experienced
ML Platform Engineer (all)
with deep Kubernetes expertise to support the infrastructure powering our AI and ML workloads. You’ll work closely with ML engineers on everything from deploying cutting-edge LLM inference to refining observability and automating workflows—always with reliability, scalability, and performance as your guiding principles.
This role can be performed remotely from anywhere in Germany or Luxembourg, or in a hybrid setup from our offices in Luxembourg or Berlin.
What you’ll own
Support and enhance our Kubernetes-based infrastructure in cloud environments, running both ML/LLM workloads and general applications
Deploy and optimize LLM inference systems
Design, build, and improve MLOps/DevOps pipelines to support the entire development lifecycle
Manage GPU scheduling and autoscaling with Kubernetes-native tooling
Ensure observability and alerting across the platform
Operate and troubleshoot supporting infrastructure
Contribute to platform reliability, security, and performance through automation and best practices
You’ll thrive in this role if you bring
5+ years of experience in MLOps or SRE
Strong hands-on Kubernetes experience, including GitOps (Flux or ArgoCD), Kustomize, Helm and production troubleshooting
Familiarity with LLM inference deployment and optimization in Kubernetes (e.g., vLLM, LMCache, llm-d)
Experience with MLOps supporting tools such as MLflow or Argo Workflows
Understanding of GPU resource orchestration in Kubernetes environments
Profound knowledge of observability tools, such as VictoriaMetrics, VictoriaLogs and Grafana
Knowledge of database and broker administration (PostgreSQL, Redis and RabbitMQ)
Solid scripting skills in Python
Comfortable working with cloud platforms (OVHcloud, AWS, GCP or Azure)
Nice-to-Haves
Experience with audio ML models or real-time inference
Exposure to CI/CD practices tailored for ML systems
Familiarity with Kubernetes networking, security, or performance tuning
Why you’ll love working with us
Full ownership of a mission-critical platform
A team that values curiosity, learning, and experimentation
Remote-first setup with the option to work in our Berlin office
Competitive salary depending on experience
Work on AI infrastructure that directly impacts healthcare innovation
Ready to apply? #J-18808-Ljbffr