Supernova Technology™

Senior Site Reliability Engineer

Supernova Technology™, Chicago, Illinois, United States, 60290

Overview

About Us: Founded in 2014, we offer a cloud-based, end-to-end software solution to automate securities-based lending from origination through the life of the loan. We enable advisors to deliver holistic, goals-based advice and help clients achieve financial wellness. We partner with banks, insurance companies, and online brokerages to democratize access to securities-based lending. Job Description: The Senior Site Reliability Engineer will own the reliability, scalability, and performance of our production systems. This role bridges engineering, platform, and security teams to ensure infrastructure meets uptime, compliance, and client experience requirements. You will lead the design and implementation of observability tools, incident response processes, and resilience strategies, shifting the organization from reactive to proactive reliability practice. Responsibilities

Ensure systems meet high-availability targets through well-defined SLAs, SLOs, and SLIs Own and optimize the monitoring, logging, and alerting stack to ensure actionable alerts Lead incident response and postmortem processes, driving remediation and prevention Plan capacity and optimize performance to address bottlenecks before they impact customers Automate operational tasks to reduce manual intervention Collaborate with DevOps to improve CI/CD reliability and with Platform Engineering to ensure infrastructure scalability Implement reliability controls required for SOC 2 and other regulatory standards Qualifications

5-8 years in SRE, operations, or performance engineering roles Bachelor's Degree in Computer Science or related fields Advanced expertise with monitoring and alerting tools Proficiency in at least one programming or scripting language such as Python, Go, or Bash Strong background in AWS cloud environments Experience with container orchestration using AWS ECS Proven track record in leading high-severity incident response calmly and effectively Familiarity with ITIL, postmortem processes, and change management controls Demonstrated ability to work cross-functionally with development, platform, and security teams Reliability-focused mindset with an emphasis on uptime and recovery speed Analytical problem-solving skills supported by metrics and data Calm and effective performance in high-pressure situations Technical depth to diagnose and resolve complex system issues Proactive leadership in anticipating and addressing reliability risks

#J-18808-Ljbffr