TrekAI
Role Purpose
TrekAI is at the forefront of reinventing education through AI. We are a high‑growth, mission‑driven startup where your work directly impacts teachers, students, and entire school systems.
TrekAI is building the next generation of AI‑driven education technology, and we need a DevOps Systems Engineer to ensure our cloud platform is fast, reliable, and resilient. This is a hands‑on role focused on operational excellence, developer experience, and customer responsiveness. You will automate deployments, harden infrastructure, and make sure TrekAI’s multi‑agent learning platform scales securely and smoothly as adoption grows.
You’ll work closely with the Systems Architect to design scalable topologies, with the Engineering Leader to streamline CI/CD pipelines and developer workflows, and with the AI/Data Science Leader to deploy and monitor model‑serving infrastructure. Your work will directly impact how quickly TrekAI can respond to schools, ship improvements, and recover from incidents — making you a critical enabler of customer trust and satisfaction.
Key Responsibilities Platform Automation & CI/CD Build and maintain CI/CD pipelines for microservices and AI models. Automate infrastructure with Terraform, Helm, ArgoCD for reproducibility and speed.
Operations & Monitoring Deploy and manage observability stacks: Prometheus, Grafana, Loki, Sentry, Posthog, Alloy. Instrument systems for metrics, logging, tracing, and error detection to improve uptime and recovery. Manage and maintain service‑level dashboards and alerting for production systems.
Resilience & BC/DR Implement backup, failover, and disaster recovery strategies to ensure ≥99.9% uptime. Run DR tests and incident simulations to validate recovery plans.
Developer Experience Shorten lead time for changes and improve local‑to‑production consistency. Provide self‑service environments for developers and QA.
Customer Responsiveness Support school pilots, rollouts, and live trials by ensuring platform readiness. Rapidly address production issues to minimize impact on teachers and students.
Required Education & Experience BS in Computer Science, Engineering, or related discipline and/or equivalent 5+ years of hands‑on DevOps or systems engineering experience in SaaS or platform environments. Strong cloud experience (AWS, GCP, or Azure) with virtualization technologies, virtual machine environments, VM and container orchestration (Kubernetes/OpenShift). Solid knowledge and administrative experience with Linux distributions (e.g., Ubuntu, Debian, RHEL, NixOS), cloud networking administration and Windows (client side) / Mac OS (client-side) Solid programming and DB skills: Python, React, Node.js, Java, Json, SQL, NoSQL Expertise with CI/CD tools (Azure DevOps, GitHub Actions, Jenkins, etc.). Familiarity with observability stacks: Prometheus, Grafana, Loki, Sentry, Posthog, and log aggregation pipelines. Experience implementing BC/DR procedures and failover strategies. Knowledge of networking (routing, TLS certs), secrets management, and secure RBAC. Educational technology (EdTech) or SaaS platform experience is a plus. Startup experience is a plus
#J-18808-Ljbffr
TrekAI is building the next generation of AI‑driven education technology, and we need a DevOps Systems Engineer to ensure our cloud platform is fast, reliable, and resilient. This is a hands‑on role focused on operational excellence, developer experience, and customer responsiveness. You will automate deployments, harden infrastructure, and make sure TrekAI’s multi‑agent learning platform scales securely and smoothly as adoption grows.
You’ll work closely with the Systems Architect to design scalable topologies, with the Engineering Leader to streamline CI/CD pipelines and developer workflows, and with the AI/Data Science Leader to deploy and monitor model‑serving infrastructure. Your work will directly impact how quickly TrekAI can respond to schools, ship improvements, and recover from incidents — making you a critical enabler of customer trust and satisfaction.
Key Responsibilities Platform Automation & CI/CD Build and maintain CI/CD pipelines for microservices and AI models. Automate infrastructure with Terraform, Helm, ArgoCD for reproducibility and speed.
Operations & Monitoring Deploy and manage observability stacks: Prometheus, Grafana, Loki, Sentry, Posthog, Alloy. Instrument systems for metrics, logging, tracing, and error detection to improve uptime and recovery. Manage and maintain service‑level dashboards and alerting for production systems.
Resilience & BC/DR Implement backup, failover, and disaster recovery strategies to ensure ≥99.9% uptime. Run DR tests and incident simulations to validate recovery plans.
Developer Experience Shorten lead time for changes and improve local‑to‑production consistency. Provide self‑service environments for developers and QA.
Customer Responsiveness Support school pilots, rollouts, and live trials by ensuring platform readiness. Rapidly address production issues to minimize impact on teachers and students.
Required Education & Experience BS in Computer Science, Engineering, or related discipline and/or equivalent 5+ years of hands‑on DevOps or systems engineering experience in SaaS or platform environments. Strong cloud experience (AWS, GCP, or Azure) with virtualization technologies, virtual machine environments, VM and container orchestration (Kubernetes/OpenShift). Solid knowledge and administrative experience with Linux distributions (e.g., Ubuntu, Debian, RHEL, NixOS), cloud networking administration and Windows (client side) / Mac OS (client-side) Solid programming and DB skills: Python, React, Node.js, Java, Json, SQL, NoSQL Expertise with CI/CD tools (Azure DevOps, GitHub Actions, Jenkins, etc.). Familiarity with observability stacks: Prometheus, Grafana, Loki, Sentry, Posthog, and log aggregation pipelines. Experience implementing BC/DR procedures and failover strategies. Knowledge of networking (routing, TLS certs), secrets management, and secure RBAC. Educational technology (EdTech) or SaaS platform experience is a plus. Startup experience is a plus
#J-18808-Ljbffr