CyRAD Solutions
Strategic Site Reliability Engineer: Global Network Orchestration Platform
Design the core reliability platform for the final frontier of space mesh networking. This is a strategic, high‑impact mandate within a high‑growth, fast‑paced startup, building the next generation of software‑defined networks for satellite megaconstellations and aerospace fleets. We seek technical leaders ready to architect mission‑critical systems and drive platform maturity.
Technical Skills & Proficiencies Required
Observability Platform Mastery: deep, hands‑on expertise in the architecture, scaling, and management of production observability stacks: Prometheus, OpenTelemetry, Grafana, Loki, and distributed tracing systems.
Cloud & Orchestration: expert‑level production experience with Kubernetes and GCP. Expertise in multi‑cloud (AWS) environments is highly preferred.
Reliability Engineering: proven ability to define, implement, and manage robust SLOs, SLIs, and error budgets for high‑availability distributed systems, crucial for mission readiness.
Automation & IaC: mastery of Infrastructure as Code (Terraform) and GitOps (ArgoCD) for automated deployment and scaling across complex cloud environments.
Programming proficiency: strong command of systems programming; fluency in Go and/or Python is required for developing and optimizing platform tooling.
Preferred domain expertise: experience with Service Mesh (Istio/Linkerd), instrumenting applications in Golang/C++, and working with HPC environments (CPU/GPU workloads).
Mandatory Security Requirements
US citizenship is required.
An active Secret security clearance or higher is strongly preferred.
Seniority level Mid‑Senior level
Employment type Full‑time
Job function Engineering and Information Technology
Industries Executive Search Services
#J-18808-Ljbffr
Technical Skills & Proficiencies Required
Observability Platform Mastery: deep, hands‑on expertise in the architecture, scaling, and management of production observability stacks: Prometheus, OpenTelemetry, Grafana, Loki, and distributed tracing systems.
Cloud & Orchestration: expert‑level production experience with Kubernetes and GCP. Expertise in multi‑cloud (AWS) environments is highly preferred.
Reliability Engineering: proven ability to define, implement, and manage robust SLOs, SLIs, and error budgets for high‑availability distributed systems, crucial for mission readiness.
Automation & IaC: mastery of Infrastructure as Code (Terraform) and GitOps (ArgoCD) for automated deployment and scaling across complex cloud environments.
Programming proficiency: strong command of systems programming; fluency in Go and/or Python is required for developing and optimizing platform tooling.
Preferred domain expertise: experience with Service Mesh (Istio/Linkerd), instrumenting applications in Golang/C++, and working with HPC environments (CPU/GPU workloads).
Mandatory Security Requirements
US citizenship is required.
An active Secret security clearance or higher is strongly preferred.
Seniority level Mid‑Senior level
Employment type Full‑time
Job function Engineering and Information Technology
Industries Executive Search Services
#J-18808-Ljbffr