Cyrad Solutions LLC
Site Reliability Engineer - SRE
Cyrad Solutions LLC, Washington, District of Columbia, us, 20022
️ Strategic Site Reliability Engineer: Global Network Orchestration Platform
The Opportunity:
Design the core reliability platform for the
final frontier of space Mesh networking . This is a strategic, high-impact mandate within a high-growth, fast-paced startup, building the next generation of software-defined networks for satellite megaconstellations and aerospace fleets.
We seek technical leaders ready to architect mission-critical systems and drive platform maturity. Technical Skills & Proficiencies Required
Observability Platform Mastery:
Deep, hands-on expertise in the architecture, scaling, and management of production observability stacks:
Prometheus, OpenTelemetry, Grafana, Loki, and distributed tracing systems. Cloud & Orchestration:
Expert-level production experience with
Kubernetes
and
GCP . Expertise in multi-cloud (AWS) environments is highly preferred. Reliability Engineering:
Proven ability to define, implement, and manage robust
SLOs, SLIs, and Error Budgets
for high-availability distributed systems, crucial for mission readiness. Automation & IaC:
Mastery of
Infrastructure as Code (Terraform)
and
GitOps (ArgoCD)
for automated deployment and scaling across complex cloud environments. Programming Proficiency:
Strong command of systems programming; fluency in
Go
and/or
Python
is required for developing and optimizing platform tooling. Preferred Domain Expertise:
Experience with
Service Mesh (Istio/Linkerd) , instrumenting applications in
Golang/C++ , and working with
HPC
environments (CPU/GPU workloads). Mandatory Security Requirements
US Citizenship
is required. An active Secret security clearance or higheris strongly preferred .
#J-18808-Ljbffr
The Opportunity:
Design the core reliability platform for the
final frontier of space Mesh networking . This is a strategic, high-impact mandate within a high-growth, fast-paced startup, building the next generation of software-defined networks for satellite megaconstellations and aerospace fleets.
We seek technical leaders ready to architect mission-critical systems and drive platform maturity. Technical Skills & Proficiencies Required
Observability Platform Mastery:
Deep, hands-on expertise in the architecture, scaling, and management of production observability stacks:
Prometheus, OpenTelemetry, Grafana, Loki, and distributed tracing systems. Cloud & Orchestration:
Expert-level production experience with
Kubernetes
and
GCP . Expertise in multi-cloud (AWS) environments is highly preferred. Reliability Engineering:
Proven ability to define, implement, and manage robust
SLOs, SLIs, and Error Budgets
for high-availability distributed systems, crucial for mission readiness. Automation & IaC:
Mastery of
Infrastructure as Code (Terraform)
and
GitOps (ArgoCD)
for automated deployment and scaling across complex cloud environments. Programming Proficiency:
Strong command of systems programming; fluency in
Go
and/or
Python
is required for developing and optimizing platform tooling. Preferred Domain Expertise:
Experience with
Service Mesh (Istio/Linkerd) , instrumenting applications in
Golang/C++ , and working with
HPC
environments (CPU/GPU workloads). Mandatory Security Requirements
US Citizenship
is required. An active Secret security clearance or higheris strongly preferred .
#J-18808-Ljbffr