Salient

Staff Infrastructure Engineer

Salient, California, Missouri, United States, 65018

We are hiring a

Staff Infrastructure Engineer

to design, build, and operate scalable, production-grade infrastructure from the ground up for enterprise, big bank clients with critical data. You'll own and develop our deployment pipelines, observability systems, and cloud infrastructure as we transition to a Kubernetes-based architecture. This is an onsite role in San Francisco — close collaboration with product, engineering, and leadership is critical. You’ll have significant autonomy: we're looking for someone who can operate independently, set technical direction, and build systems that scale without extensive guidance. Responsibilities

Architect and implement a new scalable, reliable, and secure infrastructure for real-time AI-driven voice services. Support our migration to Kubernetes (from ECS) and establish infrastructure best practices. Build, optimize, and maintain CI/CD pipelines to support rapid and safe deployments. Own monitoring, alerting, and incident response systems to ensure uptime and performance. Be the primary PoC for on-call responsibilities to maintain 24/7 uptime systems Automate operational workflows and infrastructure provisioning (Infrastructure-as-Code). Collaborate with engineering teams to debug live issues, improve system resilience, and optimize performance. Requirements

5+ years of DevOps, SRE, or infrastructure engineering experience, including experience leading projects independently. Deep expertise in cloud environments (AWS, GCP, or similar). Strong experience with containerization and orchestration (Docker & Kubernetes) in production environments. Strong awareness of networking concepts and how to implement within AWS (DNS, HTTP(S), SSH, FTP, SMTP, Firewalls, NAT) Proficiency with infrastructure-as-code tools (Terraform, Helm, etc.). Strong software engineering skills with demonstrated proficiency in languages commonly used for infrastructure automation like Python, Go, and Bash Experience designing monitoring and alerting systems (e.g., Prometheus, Datadog, Grafana). Strong understanding of security, reliability, and scaling best practices for cloud-native systems. Excellent communication skills and a hands-on, ownership-driven mindset. Willingness to work long hours - 8 am-7 pm is a good day In-person in San Francisco four days a week Nice to have

Experience working with real-time communication systems (e.g., SIP, WebRTC, LiveKit). Background in highly regulated industries (e.g., financial services, healthcare).

#J-18808-Ljbffr