Logo
Cyrad Solutions LLC

Site Reliability Engineer - SRE

Cyrad Solutions LLC, Washington, District of Columbia, us, 20022

Save Job

️ Strategic Site Reliability Engineer: Global Network Orchestration Platform

The Opportunity:

Design the core reliability platform for the

final frontier of space Mesh networking . This is a strategic, high-impact mandate within a high-growth, fast-paced startup, building the next generation of software-defined networks for satellite megaconstellations and aerospace fleets.

We seek technical leaders ready to architect mission-critical systems and drive platform maturity. Technical Skills & Proficiencies Required

Observability Platform Mastery:

Deep, hands-on expertise in the architecture, scaling, and management of production observability stacks:

Prometheus, OpenTelemetry, Grafana, Loki, and distributed tracing systems. Cloud & Orchestration:

Expert-level production experience with

Kubernetes

and

GCP . Expertise in multi-cloud (AWS) environments is highly preferred. Reliability Engineering:

Proven ability to define, implement, and manage robust

SLOs, SLIs, and Error Budgets

for high-availability distributed systems, crucial for mission readiness. Automation & IaC:

Mastery of

Infrastructure as Code (Terraform)

and

GitOps (ArgoCD)

for automated deployment and scaling across complex cloud environments. Programming Proficiency:

Strong command of systems programming; fluency in

Go

and/or

Python

is required for developing and optimizing platform tooling. Preferred Domain Expertise:

Experience with

Service Mesh (Istio/Linkerd) , instrumenting applications in

Golang/C++ , and working with

HPC

environments (CPU/GPU workloads). Mandatory Security Requirements

US Citizenship

is required. An active Secret security clearance or higheris strongly preferred .

#J-18808-Ljbffr