Logo
Integrated Research Ltd.

Principal DevSecOps Engineer

Integrated Research Ltd., Georgia Center, Vermont, United States

Save Job

IR Labs is the innovation lab inside Integrated Research where small, cross‑functional squads chase outsized, industry‑defining opportunities. We operate like a funded startup — rapid sprints, bold experimentation, zero bureaucracy — backed by the global footprint and resources of a public company. Our charter is simple: turn cutting‑edge AI research into products that customers can’t imagine working without. We target the hardest problems in software and then move fast to ship solutions that create 10x impact. If you thrive on autonomy, crave world‑class technical challenges, and want to see your ideas hit production quickly, IR Labs is your launch pad. Join us and help build the future—one breakthrough at a time. Job Description

Are you a talented DevSecOps Engineer looking to play a foundational role in building a scalable and resilient AI and machine learning innovation lab? Do you thrive in a dynamic environment where your expertise in cloud infrastructure, automation, and operational excellence directly impacts the success of cutting-edge AI applications? If you have a passion for designing and managing high-availability, fault-tolerant distributed systems, we want you on our team! As a DevSecOps Engineer at IR Labs, you will be responsible for designing, implementing, and maintaining the core infrastructure that enables the company to scale. You’ll work closely with engineering teams to build robust cloud-based environments, automate workflows, and ensure operational excellence across AI/ML platforms. If this sounds exciting to you, then we want to meet you. What You’ll Do Serve as the foundational infrastructure engineer, responsible for designing and implementing the core systems and processes that will enable the company to scale, ensuring a robust platform for future engineering growth and operational excellence. Develop and maintain best practices and patterns for deploying cloud-based infrastructure as code (IaC) securely, reliably, and efficiently using tools like Terraform or AWS CloudFormation. Partner with engineering teams to support services before they are Generally Available (GA) by contributing to system design consulting, capacity planning, and readiness reviews. Define and manage SLIs, SLOs, and SLAs for services, infrastructure, and operational processes running in production, ensuring consistent service delivery. Eliminate toil through end-to-end automation across infrastructure provisioning, configuration management (Ansible), CI/CD pipelines (GitHub Actions, ArgoCD), testing, and operations. Collaborate with machine learning engineers, backend developers, and security experts to incrementally and securely build out platforms that enable cutting-edge AI/ML applications. Maintain a deep understanding of the business’s long-term goals and ensure that system design, architecture, and availability align with these objectives. Design robust disaster recovery and multi-region failover solutions, carefully balancing availability, consistency, and cost. Research and experiment with emerging technologies and tools in availability, monitoring, high availability (HA), and capacity planning to future-proof the infrastructure. Establish and promote disciplined production engineering processes and best practices to ensure high standards of operational excellence and reliability. Desired Skills and Experience

Qualifications Extensive experience operating high-availability, fault-tolerant, and scalable distributed systems in production using GitOps practices with tools like Terraform or AWS CloudFormation. Strong programming skills in Golang, Python, or Rust, with additional proficiency in shell scripting for automating tasks and patching bleeding-edge features (e.g., Kubernetes CRDs). Expertise with monitoring and observability solutions like Prometheus, Grafana, Fluentd, Jaeger, and OpenTelemetry. Mastery of containerization concepts and systems, particularly Kubernetes (AWS EKS), with experience customizing Kubernetes using CRDs or other advanced features. Comfort operating and scaling infrastructure in a cloud-based ecosystem, with preference for AWS (e.g., VPCs, EC2, S3, IAM, RDS). Knowledge of TypeScript for creating developer portals or integrations, e.g., with Backstage. Solid understanding of networking concepts, including traffic management, load balancing, and multi-region failover strategies. Experience designing and deploying secure, production-grade services, SDKs, and infrastructure that emphasize performance, scalability, and self-service capabilities. Proficiency with cloud security best practices, including IAM, secrets management (AWS Secrets Manager), and compliance monitoring (SOC 2, HIPAA, GDPR). Demonstrated ability to define and enforce SRE principles (e.g., SLIs, SLOs, error budgets) and implement high standards of operational reliability. Experience with software engineering best practices such as unit testing, peer code reviews, design documentation, and continuous delivery. Strong troubleshooting skills for debugging distributed systems, with expertise in incident response and root cause analysis. Ability to thrive in ambiguous, fast-paced environments, conceptualizing and articulating ideas clearly and concisely. Proven track record of mentoring engineers and fostering a DevSecOps culture across cross-functional teams. Ability to effectively collaborate with stakeholders (data scientists, ML engineers, backend developers) to design infrastructure that empowers teams. Nice to Have’s Educational Background: Bachelor’s or Master’s degree in Computer Science, Engineering, Mathematics, or a related field. Security Focus: Familiarity with security audits, compliance tools, and advanced monitoring techniques to ensure airtight cloud operations. Advanced Data Infrastructure: Familiarity with streaming systems like Kafka or Apache Flink and scalable storage solutions such as Delta Lake (Databricks) or DynamoDB. MLOps Expertise: Experience with orchestration tools like Flyte, model-serving frameworks (e.g., Triton, Ray), and experiment tracking tools like MLFlow or Weights & Biases. Emerging Tools: Knowledge of bleeding-edge AI/ML service integrations (e.g., AWS Bedrock, Kong AI Gateway) and agentic frameworks like LangGraph. Our job descriptions often reflect our ideal candidate. If you have a strong foundation of relevant skills and a passion for this field, we encourage you to apply, even if you don't check every box. What We Offer Culture: Join a passionate, driven team that values collaboration, innovation, and having fun while making a difference. High‑Impact Ownership: Your code and ideas will go live in weeks, not quarters. Every engineer owns features end‑to‑end and sees their work land in production with Fortune‑grade customers. Innovation: Work on cutting-edge AI solutions that solve real-world problems and shape the future of technology. Growth: Opportunity for personal and professional growth as the company scales. Flexible Work Culture: Benefit from a flexible work environment that promotes work-life balance and remote work. Competitive Compensation: Receive a competitive salary, performance bonuses, equity participation and a generous benefits package. 401k with Employer Contributions Health Savings Account (HSA) Contributions with High Deductible Health Plan Short-Term/Long-Term Disability Insurance And more! Compensation Range $180,000 - $200,000 base $50,000 - $60,000 variable compensation Actual compensation offer to candidate may vary from posted hiring range based upon geographic location, work experience, education, and/or skill level. The pay ratio between base pay and target incentive (if applicable) will be finalized at the offer stage. At IR we celebrate, support, and thrive on difference for the benefit of our employees, our products, and our community. We are proud to be an Equal Employment Opportunity employer and encourage applications from all suitable candidates; we never discriminate based on race, religion, national origin, gender identity or expression, sexual orientation, age, or marital, veteran, or disability status.

#J-18808-Ljbffr