Logo
Integrated Research

Principal Data Infrastructure Engineer

Integrated Research, Denver, Colorado, United States, 80285

Save Job

IR Labs is the innovation lab inside Integrated Research where small, cross?functional squads chase outsized, industry?defining opportunities. We operate like a funded startup rapid sprints, bold experimentation, zero bureaucracy backed by the global footprint and resources of a public company. Our charter is simple: turn cutting?edge AI research into products that customers cant imagine working without. We target the hardest problems in software and then move fast to ship solutions that create 10x impact. If you thrive on autonomy, crave world?class technical challenges, and want to see your ideas hit production quickly, IR Labs is your launch pad. Join us and help build the futureone breakthrough at a time. Job Description

Are you a talented Data Infrastructure Engineer looking to make a significant impact in a rapidly evolving AI and machine learning innovation lab? Do you thrive in a fast-paced setting where your work bridges the gap between data infrastructure, MLOps, and scalable cloud-native architectures? If you have a passion for building high-performance, real-time data systems that power cutting-edge AI applications, we want you on our team! As a Data Infrastructure Engineer at IR Labs, you will play a foundational role in designing, implementing, and scaling mission-critical data systems and workflows. Youll work closely with machine learning engineers, backend developers, and DevSecOps teams to create robust data pipelines, real-time streaming architectures, and automation frameworks that accelerate AI innovation. If this sounds exciting to you, then we want to meet you. What Youll Do Serve as the foundational data infrastructure engineer, responsible for designing, implementing, and scaling the core data systems and workflows to empower machine learning engineers (MLEs) and software engineers to independently build and operate data pipelines. Build and maintain stream-first data architectures (Kappa) using tools like Apache Kafka, Apache Flink, or Spark Streaming, ensuring low-latency, real-time data processing at scale. Develop and implement infrastructure-as-code (IaC) solutions with tools like Terraform or AWS CloudFormation, ensuring scalable, repeatable deployments of data infrastructure on AWS. Automate infrastructure operations, data pipeline orchestration, and CI/CD workflows using GitHub Actions, ArgoCD, and configuration management tools like Ansible. Partner closely with MLEs, backend developers, and DevSecOps professionals to create a unified platform that supports data lake, data streaming, and MLOps pipelines. Enable self-service capabilities by building SDKs, APIs, and portal integrations (e.g., Backstage) for seamless onboarding and management of data workflows by other engineers. Ensure observability and reliability by deploying monitoring (Prometheus, Grafana), logging (OLK stack + Fluentd), and tracing (Jaeger) across data workflows and infrastructure. Design and implement robust data security, IAM policies, and secrets management solutions leveraging AWS IAM, AWS Secrets Manager, and AWS Security Hub. Research and evaluate emerging data and MLOps technologies, such as Flyte, Ray, Triton Inference Server, and Databricks, to ensure the platform evolves with industry best practices. Establish and promote best practices for data infrastructure automation, stream processing, and scaling workflows across batch and real-time use cases. Desired Skills and Experience

Qualifications Extensive experience (8+ years) building and operating scalable, real-time data infrastructure with a focus on stream processing using tools like Kafka, Apache Flink, or Spark Streaming. Strong expertise in data lake/warehouse architectures using Delta Lake (Databricks) with S3 or similar backing stores. Proficiency with MLOps tooling, including orchestration (e.g., Flyte), experiment tracking (e.g., MLFlow, Weights & Biases), and model serving (e.g., Triton, Ray, vLLM). Solid understanding of container orchestration with Kubernetes (AWS EKS) and containerization using Podman or Docker. Hands-on experience with AWS services for data infrastructure, including S3, Glue, Lambda, Redshift, and Athena. Proven ability to design and enforce data security best practices, including IAM, role-based access control (RBAC), and centralized secrets management (AWS Secrets Manager). Familiarity with metadata management solutions such as Unity Catalog (Databricks) for governance, lineage tracking, and compliance. Demonstrated ability to automate the end-to-end data pipeline lifecycle, including provisioning, monitoring, and scaling using tools like GitHub Actions, ArgoCD, and Terraform. Experience implementing observability for data workflows using Prometheus, Grafana, Jaeger, and Fluentd to enable proactive troubleshooting and ensure SLAs. Proficiency in defining and monitoring data SLAs, data quality metrics, and lineage to ensure reliable production pipelines. Proficient in Python for building and orchestrating data workflows, with additional experience in Rust or C++ for performance-critical components. Strong understanding of distributed system design and algorithmic principles required for scalable, fault-tolerant data processing. Knowledge of TypeScript for creating developer portals or integrations, e.g., with Backstage. Experience mentoring engineers and establishing standards for data infrastructure development, monitoring, and security. Strong communication skills to articulate and document architectural decisions and complex workflows for technical and non-technical stakeholders. Nice to Haves Educational Background: Bachelors or Masters degree in Computer Science, Data Engineering, or related fields. Stream-First Systems: Deep knowledge of Kappa architecture principles, with hands-on experience operating streaming systems under high throughput and low latency. Big Data: Experience working with petabyte-scale data and optimizing infrastructure to handle both batch and real-time analytics workloads. Data Governance: Familiarity with advanced data governance and compliance solutions, particularly AWS Lake Formation and Unity Catalog. AI/ML Workflow Automation: Experience integrating MLOps pipelines with annotation tools (e.g., LabelStudio), LLM gateways (e.g., Kong AI Gateway), or agentic frameworks (e.g., LangGraph). Our job descriptions often reflect our ideal candidate. If you have a strong foundation of relevant skills and a passion for this field, we encourage you to apply, even if you don't check every box. What We Offer Culture: Join a passionate, driven team that values collaboration, innovation, and having fun while making a difference. High?Impact Ownership: Your code and ideas will go live in weeks, not quarters. Every engineer owns features end?to?end and sees their work land in production with Fortune?grade customers. Innovation: Work on cutting-edge AI solutions that solve real-world problems and shape the future of technology. Growth: Opportunity for personal and professional growth as the company scales. Flexible Work Culture: Benefit from a flexible work environment that promotes work-life balance and remote work. Competitive Compensation: Receive a competitive salary, performance bonuses, equity participation and a generous benefits package. Compensation Range $180,000 - $200,000 base $50,000 - $60,000 variable compensation Actual compensation offer to candidate may vary from posted hiring range based upon geographic location, work experience, education, and/or skill level. The pay ratio between base pay and target incentive (if applicable) will be finalized at the offer stage. At IR we celebrate, support, and thrive on difference for the benefit of our employees, our products, and our community. We are proud to be an Equal Employment Opportunity employer and encourage applications from all suitable candidates; we never discriminate based on race, religion, national origin, gender identity or expression, sexual orientation, age, or marital, veteran, or disability status #J-18808-Ljbffr