Optimal Dynamics, Inc
Staff Software Engineer, Site Reliability (SRE)
Optimal Dynamics, Inc, Aurora, Colorado, United States, 80012
Staff Software Engineer, Site Reliability (SRE)
Remote About Our Company
Built on over four decades of pioneering research at Princeton University, our platform represents the leading edge of innovation in freight and transportation planning. We help customers unlock double-digit revenue gains and drive smarter, data-driven operations at scale. Were a high-growth company of ~70 employees, backed by investors including Bessemer Venture Partners, The Westly Group, Activate Capital, and Koch. We recently completed a Series C financing round led by Koch Disruptive Technologies and are entering a new phase of growth. Were on a mission to redefine the way logistics decisions are madeand were just getting started. About Our Team
We are a team of bright, kind, and solution-oriented people focused on creating value for our customers. We solve problems collectively and strive for solutions that are secure, reliable, maintainable, and scalable for the long run. About the Role
Were hiring a
Staff Software Engineer, Site Reliability
to lead reliability across our production platform. As a Staff?level IC, you will drive strategy and hands?on execution across incident response, SLO/SLI programs, and production readiness, directly owning highly available services in AWS; and you will partner with Platform/Infra to build paved?road tooling in our monorepo. This is a full?time, remote?friendly role open to candidates across the United States. For in?office collaboration, our HQ in New York City offers a collaborative environment. What Youll Do
Reliability (?50%) Own the company?wide incident lifecycle: standards for detection, escalation, incident command, customer communications, and high?quality postmortems with action tracking. Define and drive
SLIs/SLOs
for core services; build guardrails and dashboards that make reliability visible and actionable. Lead production readiness reviews, capacity/performance planning, load testing, disaster recovery exercises, and resilience engineering (failure testing/chaos where appropriate). Level?up on?call: right?sizing rotations, paging hygiene, runbooks, auto?remediation, and continuous improvement of MTTA/MTTR. Security (?30%) Embed security into the delivery pipeline: dependency and image scanning, least?privilege/IAM baselines, secrets management, and service?to?service auth. Partner with Engineering leadership to maintain SOC 2?aligned controls as code; make audit?friendly evidence generation part of everyday engineering. Drive secure?by?default patterns in the platform (network posture, data protection, runtime policies) without slowing down developers. Build and evolve paved roads for deploys, config, and runtime operations in our monorepo (Bazel) and CI/CD (AWS CodePipeline/CodeBuild). Partner with product teams to make the secure, reliable default the easiest pathtemplates, tooling, libraries, and automation. Who You Are
Experienced : Staff?level IC who has led reliability programs at meaningful scale and owned incident response standards. Technically Grounded : Deep, hands?on experience with infrastructure at scale, cloud, containerization, and more. ECS and/or Kubernetes
containerization workloads CICD & IaC (Terraform) Python Proficient : You can read/review service code and land operational improvements. Data Driven : In your approach to SLOs, capacity, performance, and cost efficiency with strong observability chops Influential : Able to shape direction and create simple, durable standards Communicative : Excels in both technical and interpersonal communication, with strong written and verbal skills Nice To Have (Bonus Points)
Aware of FinOps (cost attribution, efficient scaling) and DR/BCP program experience. Familiar with secure SDLC, threat modeling, and compliance automation in a
SOC 2
context. Experience collaborating with
Data Science/ML
teams and batch/streaming workloads. Exposure to monorepo frameworks such as Bazel, Buck, etc. About our tech stack and development practices
At Optimal Dynamics, our entire infrastructure runs on AWS, leveraging services including DynamoDB, Aurora, SSM, and SQS. Backend & AI: Python 3 and Java. Data Stack: Trino, Dagster, dbt, DuckDB, and Preset. IaC: Terraform and Spacelift. Cloud: AWS (ECS/RDS/S3/etc). CI/CD: Bazel, Github, AWS CodePipeline/CodeBuild. We follow modern development practices with code stored on GitHub. Every pull request undergoes thorough code reviews, is fully unit tested, and deployed through our CI/CD pipeline. Pay Range
$180,000 - $220,000 USD Competitive compensation, including Series C level equity Health / Dental / Vision 100% covered for employee and 50% for dependents Life Insurance, with optional supplemental insurance Flexible Spending Account (FSA) Health Spending Account (HSA) 401(k) with match Unlimited PTO (vacation, personal days, sick days, jury duty, military leave, bereavement) 11 Holidays Paid Parental Leave for all employees Short-term and Long-term Disability Insurances, and AD&D Insurance Fitness membership reimbursement Optimal Dynamics is an equal opportunity employer. We are committed to creating an inclusive workplace and recruiting from a diverse candidate pool. If you require an accommodation during the interview process, please email careers@optimaldynamics.com. #J-18808-Ljbffr
Remote About Our Company
Built on over four decades of pioneering research at Princeton University, our platform represents the leading edge of innovation in freight and transportation planning. We help customers unlock double-digit revenue gains and drive smarter, data-driven operations at scale. Were a high-growth company of ~70 employees, backed by investors including Bessemer Venture Partners, The Westly Group, Activate Capital, and Koch. We recently completed a Series C financing round led by Koch Disruptive Technologies and are entering a new phase of growth. Were on a mission to redefine the way logistics decisions are madeand were just getting started. About Our Team
We are a team of bright, kind, and solution-oriented people focused on creating value for our customers. We solve problems collectively and strive for solutions that are secure, reliable, maintainable, and scalable for the long run. About the Role
Were hiring a
Staff Software Engineer, Site Reliability
to lead reliability across our production platform. As a Staff?level IC, you will drive strategy and hands?on execution across incident response, SLO/SLI programs, and production readiness, directly owning highly available services in AWS; and you will partner with Platform/Infra to build paved?road tooling in our monorepo. This is a full?time, remote?friendly role open to candidates across the United States. For in?office collaboration, our HQ in New York City offers a collaborative environment. What Youll Do
Reliability (?50%) Own the company?wide incident lifecycle: standards for detection, escalation, incident command, customer communications, and high?quality postmortems with action tracking. Define and drive
SLIs/SLOs
for core services; build guardrails and dashboards that make reliability visible and actionable. Lead production readiness reviews, capacity/performance planning, load testing, disaster recovery exercises, and resilience engineering (failure testing/chaos where appropriate). Level?up on?call: right?sizing rotations, paging hygiene, runbooks, auto?remediation, and continuous improvement of MTTA/MTTR. Security (?30%) Embed security into the delivery pipeline: dependency and image scanning, least?privilege/IAM baselines, secrets management, and service?to?service auth. Partner with Engineering leadership to maintain SOC 2?aligned controls as code; make audit?friendly evidence generation part of everyday engineering. Drive secure?by?default patterns in the platform (network posture, data protection, runtime policies) without slowing down developers. Build and evolve paved roads for deploys, config, and runtime operations in our monorepo (Bazel) and CI/CD (AWS CodePipeline/CodeBuild). Partner with product teams to make the secure, reliable default the easiest pathtemplates, tooling, libraries, and automation. Who You Are
Experienced : Staff?level IC who has led reliability programs at meaningful scale and owned incident response standards. Technically Grounded : Deep, hands?on experience with infrastructure at scale, cloud, containerization, and more. ECS and/or Kubernetes
containerization workloads CICD & IaC (Terraform) Python Proficient : You can read/review service code and land operational improvements. Data Driven : In your approach to SLOs, capacity, performance, and cost efficiency with strong observability chops Influential : Able to shape direction and create simple, durable standards Communicative : Excels in both technical and interpersonal communication, with strong written and verbal skills Nice To Have (Bonus Points)
Aware of FinOps (cost attribution, efficient scaling) and DR/BCP program experience. Familiar with secure SDLC, threat modeling, and compliance automation in a
SOC 2
context. Experience collaborating with
Data Science/ML
teams and batch/streaming workloads. Exposure to monorepo frameworks such as Bazel, Buck, etc. About our tech stack and development practices
At Optimal Dynamics, our entire infrastructure runs on AWS, leveraging services including DynamoDB, Aurora, SSM, and SQS. Backend & AI: Python 3 and Java. Data Stack: Trino, Dagster, dbt, DuckDB, and Preset. IaC: Terraform and Spacelift. Cloud: AWS (ECS/RDS/S3/etc). CI/CD: Bazel, Github, AWS CodePipeline/CodeBuild. We follow modern development practices with code stored on GitHub. Every pull request undergoes thorough code reviews, is fully unit tested, and deployed through our CI/CD pipeline. Pay Range
$180,000 - $220,000 USD Competitive compensation, including Series C level equity Health / Dental / Vision 100% covered for employee and 50% for dependents Life Insurance, with optional supplemental insurance Flexible Spending Account (FSA) Health Spending Account (HSA) 401(k) with match Unlimited PTO (vacation, personal days, sick days, jury duty, military leave, bereavement) 11 Holidays Paid Parental Leave for all employees Short-term and Long-term Disability Insurances, and AD&D Insurance Fitness membership reimbursement Optimal Dynamics is an equal opportunity employer. We are committed to creating an inclusive workplace and recruiting from a diverse candidate pool. If you require an accommodation during the interview process, please email careers@optimaldynamics.com. #J-18808-Ljbffr