Logo
Flower Labs

Senior DevOps Engineer

Flower Labs, Germantown, Ohio, United States

Save Job

Overview

Senior DevOps Engineer at Flower Labs. Design, automate, and operate the infrastructure powering Flower’s open-source and enterprise platforms. Collaborate with backend, research, and frontend teams to ensure systems are scalable, reliable, and secure. Drive infrastructure-as-code, observability, and CI/CD practices in modern cloud environments. For this position we are currently hiring in the UK and Germany but are open to other Europe-based applicants to better align with team time zones. Flower Labs is the world-class AI startup behind Flower, the popular open-source framework for training AI on distributed data and compute resources using federated learning. Trusted by industry leaders and a growing developer community, Flower enables organizations to work with private data across silos or devices while advancing AI capabilities. Responsibilities

Design, implement, and maintain scalable, secure, and resilient cloud infrastructure using Terraform, OpenTofu, and Ansible. Develop infrastructure automation and deployment strategies across AWS, GCP, and Azure. Define and enforce best practices for GitOps, configuration management, and infrastructure lifecycle. Collaborate with engineering teams to design and evolve Kubernetes-based deployments for Flower’s products and open-source systems. Contribute to the long-term infrastructure roadmap, ensuring scalability and operational excellence. Operations & Reliability

Build and operate production-grade Kubernetes clusters, container runtimes, and CI/CD workflows. Develop monitoring and alerting pipelines using Prometheus, Grafana, and modern observability stacks. Maintain system reliability, resilience, and uptime through automation, runbooks, and continuous delivery. Continuously improve system performance, cost-efficiency, and security posture. Troubleshoot and resolve complex production issues, promoting a culture of proactive observability. CI/CD & Workflow Automation

Design and maintain robust CI/CD pipelines in GitLab and/or GitHub Actions. Implement and evolve GitOps workflows using ArgoCD and Helm. Support engineering teams by automating testing, deployment, and infrastructure provisioning processes. Standardize CI/CD best practices and empower teams to deploy safely and autonomously. Performance & Reliability

Profile, optimize, and refactor critical code paths to improve performance under real-world workloads. Design scalable storage, messaging, and computation solutions for federated and distributed systems. Ensure system reliability and resilience through automation, CI/CD, and observability practices. Security, Access, and Compliance

Implement secure design patterns for communication, authentication, and infrastructure management. Manage IAM, secret handling, and access control across multi-cloud environments. Contribute to security reviews and audits, including key management and network hardening. Collaborate on privacy-preserving infrastructure strategies supporting federated learning systems. Ensure compliance with internal and external standards for data protection and security. Collaboration & Open Source

Collaborate with the open-source Flower community to improve deployment and observability tooling. Review and guide community contributions related to DevOps, infrastructure, and CI/CD. Document infrastructure standards, provide training, and share knowledge across teams. Represent Flower in relevant community events, conferences, and technical forums. About the Team

You can expect a mission-driven, collaborative, fast-paced startup environment with experts in their fields. You will have opportunities to contribute ideas and influence the direction of the company. We value collaboration over competition and work to win as one team. About You

We’re looking for a strategic and hands-on DevOps engineer who’s passionate about building resilient infrastructure, automating complex systems, and empowering engineering teams to deliver AI with speed, reliability, and security. Must-have Qualifications

Proven experience with Terraform, OpenTofu, and Ansible Strong knowledge of Kubernetes, Docker, and GitOps workflows (ArgoCD) Hands-on experience with Prometheus, Grafana, and modern observability stacks Proficiency with GitLab and/or GitHub CI/CD pipelines Strong Linux (Debian, Ubuntu) administration experience Practical experience with AWS, GCP, or Azure Excellent written and verbal communication in English Self-driven, collaborative, and comfortable with asynchronous remote work Optional qualifications

Experience with OpenShift or OpenTelekomCloud Familiarity with Helm, Kapitan, and advanced templating tools Knowledge of Keycloak and IAM best practices Understanding of distributed systems, networking, or site reliability engineering (SRE) principles An understanding of machine learning Hands-on experience in PyTorch and multi-GPU environments Referrals increase your chances of interviewing at Flower Labs. Get notified about new DevOps Engineer jobs in Germany.

#J-18808-Ljbffr