CyRAD Solutions

Site Reliability Engineer - SRE

CyRAD Solutions, Washington, District of Columbia, us, 20022

Strategic Site Reliability Engineer: Global Network Orchestration Platform Design the core reliability platform for the final frontier of space mesh networking. This is a strategic, high‑impact mandate within a high‑growth, fast‑paced startup, building the next generation of software‑defined networks for satellite megaconstellations and aerospace fleets. We seek technical leaders ready to architect mission‑critical systems and drive platform maturity.

Technical Skills & Proficiencies Required

Observability Platform Mastery: deep, hands‑on expertise in the architecture, scaling, and management of production observability stacks: Prometheus, OpenTelemetry, Grafana, Loki, and distributed tracing systems.

Cloud & Orchestration: expert‑level production experience with Kubernetes and GCP. Expertise in multi‑cloud (AWS) environments is highly preferred.

Reliability Engineering: proven ability to define, implement, and manage robust SLOs, SLIs, and error budgets for high‑availability distributed systems, crucial for mission readiness.

Automation & IaC: mastery of Infrastructure as Code (Terraform) and GitOps (ArgoCD) for automated deployment and scaling across complex cloud environments.

Programming proficiency: strong command of systems programming; fluency in Go and/or Python is required for developing and optimizing platform tooling.

Preferred domain expertise: experience with Service Mesh (Istio/Linkerd), instrumenting applications in Golang/C++, and working with HPC environments (CPU/GPU workloads).

Mandatory Security Requirements

US citizenship is required.

An active Secret security clearance or higher is strongly preferred.

Seniority level Mid‑Senior level

Employment type Full‑time

Job function Engineering and Information Technology

Industries Executive Search Services

#J-18808-Ljbffr