Confidential Education Management
Our organization is at the forefront of reinventing education through AI. We are a high-growth, mission-driven startup where your work directly impacts teachers, students, and entire school systems.
We are building the next generation of AI-driven education technology, and we need a DevOps Systems Engineer to ensure our cloud platform is fast, reliable, and resilient. This is a hands-on role focused on operational excellence, developer experience, and customer responsiveness. You will automate deployments, harden infrastructure, and make sure our multi-agent learning platform scales securely and smoothly as adoption grows.
You’ll work closely with the Systems Architect to design scalable topologies, with the Engineering Leader to streamline CI/CD pipelines and developer workflows, and with the AI/Data Science Leader to deploy and monitor model-serving infrastructure. Your work will directly impact how quickly we can respond to schools, ship improvements, and recover from incidents — making you a critical enabler of customer trust and satisfaction.
Key Responsibilities
Platform Automation & CI/CD
Build and maintain CI/CD pipelines for microservices and AI models. Automate infrastructure with Terraform, Helm, ArgoCD for reproducibility and speed.
Operations & Monitoring
Deploy and manage observability stacks: Prometheus, Grafana, Loki, Sentry, Posthog, Alloy. Instrument systems for metrics, logging, tracing, and error detection to improve uptime and recovery. Manage and maintain service-level dashboards and alerting for production systems.
Resilience & BC/DR
Implement backup, failover, and disaster recovery strategies to ensure ≥99.9% uptime. Run DR tests and incident simulations to validate recovery plans. Shorten lead time for changes and improve local-to-production consistency. Provide self-service environments for developers and QA. Support school pilots, rollouts, and live trials by ensuring platform readiness. Rapidly address production issues to minimize impact on teachers and students.
Required Education & Experience
BS in Computer Science, Engineering, or related discipline and/or equivalent 5+ years of hands-on DevOps or systems engineering experience in SaaS or platform environments. Strong cloud experience (AWS, GCP, or Azure) with virtualization technologies, virtual machine environments, VM and container orchestration (Kubernetes/OpenShift). Solid knowledge and administrative experience with Linux distributions (e.g., Ubuntu, Debian, RHEL, NixOS), cloud networking administration and Windows (client side) / Mac OS (client-side) Solid programming and DB skills: Python, React, Node.js, Java, Json, SQL, NoSQL Expertise with CI/CD tools (Azure DevOps, GitHub Actions, Jenkins, etc.). Familiarity with observability stacks: Prometheus, Grafana, Loki, Sentry, Posthog, and log aggregation pipelines. Experience implementing BC/DR procedures and failover strategies. Knowledge of networking (routing, TLS certs), secrets management, and secure RBAC. Educational technology (EdTech) or SaaS platform experience is a plus. Startup experience is a plus
#J-18808-Ljbffr
Platform Automation & CI/CD
Build and maintain CI/CD pipelines for microservices and AI models. Automate infrastructure with Terraform, Helm, ArgoCD for reproducibility and speed.
Operations & Monitoring
Deploy and manage observability stacks: Prometheus, Grafana, Loki, Sentry, Posthog, Alloy. Instrument systems for metrics, logging, tracing, and error detection to improve uptime and recovery. Manage and maintain service-level dashboards and alerting for production systems.
Resilience & BC/DR
Implement backup, failover, and disaster recovery strategies to ensure ≥99.9% uptime. Run DR tests and incident simulations to validate recovery plans. Shorten lead time for changes and improve local-to-production consistency. Provide self-service environments for developers and QA. Support school pilots, rollouts, and live trials by ensuring platform readiness. Rapidly address production issues to minimize impact on teachers and students.
Required Education & Experience
BS in Computer Science, Engineering, or related discipline and/or equivalent 5+ years of hands-on DevOps or systems engineering experience in SaaS or platform environments. Strong cloud experience (AWS, GCP, or Azure) with virtualization technologies, virtual machine environments, VM and container orchestration (Kubernetes/OpenShift). Solid knowledge and administrative experience with Linux distributions (e.g., Ubuntu, Debian, RHEL, NixOS), cloud networking administration and Windows (client side) / Mac OS (client-side) Solid programming and DB skills: Python, React, Node.js, Java, Json, SQL, NoSQL Expertise with CI/CD tools (Azure DevOps, GitHub Actions, Jenkins, etc.). Familiarity with observability stacks: Prometheus, Grafana, Loki, Sentry, Posthog, and log aggregation pipelines. Experience implementing BC/DR procedures and failover strategies. Knowledge of networking (routing, TLS certs), secrets management, and secure RBAC. Educational technology (EdTech) or SaaS platform experience is a plus. Startup experience is a plus
#J-18808-Ljbffr