Sonia
ML Platform Engineer (all)
– Sonia
With Sonia, doctors are successful doctors. We create and deploy AI‑enhanced solutions that make doctors’ lives easier, patients’ care better, and healthcare systems more efficient. If you’re an intrinsically motivated self‑starter who values impactful work, join us in revolutionizing healthcare.
Location:
Remote from anywhere in Germany or Luxembourg, or in a hybrid setup from our offices in Luxembourg or Berlin.
Responsibilities
Support and enhance our Kubernetes‑based infrastructure in cloud environments, running both ML/LLM workloads and general applications.
Deploy and optimize LLM inference systems.
Design, build, and improve MLOps/DevOps pipelines to support the entire development lifecycle.
Manage GPU scheduling and autoscaling with Kubernetes‑native tooling.
Ensure observability and alerting across the platform.
Operate and troubleshoot supporting infrastructure.
Contribute to platform reliability, security, and performance through automation and best practices.
Qualifications
5+ years of experience in MLOps or SRE.
Strong hands‑on Kubernetes experience, including GitOps (Flux or ArgoCD), Kustomize, Helm and production troubleshooting.
Familiarity with LLM inference deployment and optimization in Kubernetes (e.g., vLLM, LMCache, llm‑d).
Experience with MLOps supporting tools such as MLflow or Argo Workflows.
Understanding of GPU resource orchestration in Kubernetes environments.
Profound knowledge of observability tools, such as VictoriaMetrics, VictoriaLogs and Grafana.
Knowledge of database and broker administration (PostgreSQL, Redis and RabbitMQ).
Solid scripting skills in Python.
Comfortable working with cloud platforms (OVHcloud, AWS, GCP or Azure).
Nice‑to‑Haves
Experience with audio ML models or real‑time inference.
Exposure to CI/CD practices tailored for ML systems.
Familiarity with Kubernetes networking, security, or performance tuning.
Benefits
Full ownership of a mission‑critical platform.
A team that values curiosity, learning, and experimentation.
Remote‑first setup with the option to work in our Berlin office.
Competitive salary depending on experience.
Work on AI infrastructure that directly impacts healthcare innovation.
#J-18808-Ljbffr
– Sonia
With Sonia, doctors are successful doctors. We create and deploy AI‑enhanced solutions that make doctors’ lives easier, patients’ care better, and healthcare systems more efficient. If you’re an intrinsically motivated self‑starter who values impactful work, join us in revolutionizing healthcare.
Location:
Remote from anywhere in Germany or Luxembourg, or in a hybrid setup from our offices in Luxembourg or Berlin.
Responsibilities
Support and enhance our Kubernetes‑based infrastructure in cloud environments, running both ML/LLM workloads and general applications.
Deploy and optimize LLM inference systems.
Design, build, and improve MLOps/DevOps pipelines to support the entire development lifecycle.
Manage GPU scheduling and autoscaling with Kubernetes‑native tooling.
Ensure observability and alerting across the platform.
Operate and troubleshoot supporting infrastructure.
Contribute to platform reliability, security, and performance through automation and best practices.
Qualifications
5+ years of experience in MLOps or SRE.
Strong hands‑on Kubernetes experience, including GitOps (Flux or ArgoCD), Kustomize, Helm and production troubleshooting.
Familiarity with LLM inference deployment and optimization in Kubernetes (e.g., vLLM, LMCache, llm‑d).
Experience with MLOps supporting tools such as MLflow or Argo Workflows.
Understanding of GPU resource orchestration in Kubernetes environments.
Profound knowledge of observability tools, such as VictoriaMetrics, VictoriaLogs and Grafana.
Knowledge of database and broker administration (PostgreSQL, Redis and RabbitMQ).
Solid scripting skills in Python.
Comfortable working with cloud platforms (OVHcloud, AWS, GCP or Azure).
Nice‑to‑Haves
Experience with audio ML models or real‑time inference.
Exposure to CI/CD practices tailored for ML systems.
Familiarity with Kubernetes networking, security, or performance tuning.
Benefits
Full ownership of a mission‑critical platform.
A team that values curiosity, learning, and experimentation.
Remote‑first setup with the option to work in our Berlin office.
Competitive salary depending on experience.
Work on AI infrastructure that directly impacts healthcare innovation.
#J-18808-Ljbffr