Aldea
About Aldea
Aldea is a multi-modal foundational AI company changing the scaling laws of AI. We believe that today's models and model architectures present significant bottlenecks to the transformation of software. We are building the models that will power the next era of software.
3-5 Key Responsibilities Multi-Environment Kubernetes Architecture
- Manage 5 distinct environments (NMS, Sandbox, Development, Staging, Production) with different security requirements and design redundancy/failover mechanisms Infrastructure as Code Excellence
- Develop and maintain Pulumi-based infrastructure using Python, managing complex cross-environment dependencies and VPC peering relationships Zero-Trust Security Implementation
- Implement certificate-based VPN access with internal DNS resolution, configure WAF/security groups, and manage HashiCorp Vault integration Comprehensive Observability
- Deploy and configure Prometheus, Grafana, Loki, Jaeger, and CloudWatch with unified monitoring across distributed infrastructure API Platform Management
- Deploy and maintain centralized API managing all environments from NMS hub, implementing automation for training jobs and inference optimization Requirements
Must Have Qualifications
5+ years in DevOps, SRE, or infrastructure engineering Expert-level Kubernetes experience with EKS and multi-cluster management Strong Python programming skills for infrastructure automation and API development Infrastructure as Code expertise with Pulumi, Terraform, or similar tools Deep AWS knowledge: VPC, EKS, ECR, S3, CloudWatch, IAM, and networking Linux system administration and containerization with Docker Hands-on experience with Prometheus, Grafana, and centralized logging systems Network security experience including VPN, firewalls, and certificate management Nice to Have Qualifications
Machine Learning infrastructure experience (GPU clusters, model serving, ML pipelines) HashiCorp Vault administration and integration GitOps experience with ArgoCD or similar tools Service mesh experience (Istio, Linkerd) Database administration (PostgreSQL, Redis, Elasticsearch) CI/CD pipeline design and multi-cloud infrastructure experience Benefits
Compensation & Benefits We are a well-funded, Seed-stage company preparing for launch. We offer:
Competitive base salary Performance-based bonus based on achieving goals Equity participation Comprehensive benefits, including health, dental, vision, and paid time off Flexible work environment
Aldea is a multi-modal foundational AI company changing the scaling laws of AI. We believe that today's models and model architectures present significant bottlenecks to the transformation of software. We are building the models that will power the next era of software.
3-5 Key Responsibilities Multi-Environment Kubernetes Architecture
- Manage 5 distinct environments (NMS, Sandbox, Development, Staging, Production) with different security requirements and design redundancy/failover mechanisms Infrastructure as Code Excellence
- Develop and maintain Pulumi-based infrastructure using Python, managing complex cross-environment dependencies and VPC peering relationships Zero-Trust Security Implementation
- Implement certificate-based VPN access with internal DNS resolution, configure WAF/security groups, and manage HashiCorp Vault integration Comprehensive Observability
- Deploy and configure Prometheus, Grafana, Loki, Jaeger, and CloudWatch with unified monitoring across distributed infrastructure API Platform Management
- Deploy and maintain centralized API managing all environments from NMS hub, implementing automation for training jobs and inference optimization Requirements
Must Have Qualifications
5+ years in DevOps, SRE, or infrastructure engineering Expert-level Kubernetes experience with EKS and multi-cluster management Strong Python programming skills for infrastructure automation and API development Infrastructure as Code expertise with Pulumi, Terraform, or similar tools Deep AWS knowledge: VPC, EKS, ECR, S3, CloudWatch, IAM, and networking Linux system administration and containerization with Docker Hands-on experience with Prometheus, Grafana, and centralized logging systems Network security experience including VPN, firewalls, and certificate management Nice to Have Qualifications
Machine Learning infrastructure experience (GPU clusters, model serving, ML pipelines) HashiCorp Vault administration and integration GitOps experience with ArgoCD or similar tools Service mesh experience (Istio, Linkerd) Database administration (PostgreSQL, Redis, Elasticsearch) CI/CD pipeline design and multi-cloud infrastructure experience Benefits
Compensation & Benefits We are a well-funded, Seed-stage company preparing for launch. We offer:
Competitive base salary Performance-based bonus based on achieving goals Equity participation Comprehensive benefits, including health, dental, vision, and paid time off Flexible work environment