Vantaca
Lead Infrastructure Engineer (HOAi)
Vantaca, Wilmington, North Carolina, United States, 28412
Overview
The Lead AI Infrastructure Engineer at HOAi is responsible for scaling and maintaining the infrastructure that powers our AI-driven products and services. This role sits at the intersection of infrastructure engineering, machine learning operations, and product development, ensuring our AI systems operate with exceptional reliability, performance, and efficiency. The ideal candidate is someone who gets excited about making AI systems fundamentally faster and more scalable. You\'ll work directly with our engineering and product teams to build the foundational infrastructure that enables HOAi to deliver the most advanced AI product in the community association management industry. Accountability & Initiatives
Infrastructure Ownership: Design, build, and maintain the cloud architecture, model serving infrastructure, and ML pipelines that power HOAi\'s products Performance Optimization: Profile and optimize AI workloads to achieve sub-second inference latency while managing costs effectively Scalability & Reliability: Build auto-scaling systems, implement robust failover mechanisms, and ensure 99.99% uptime for mission-critical AI services MLOps Excellence: Develop and maintain CI/CD pipelines for model deployment, monitoring, and versioning across development and production environments Developer Enablement: Create tooling and infrastructure that allows product engineers to deploy AI features quickly and safely Security & Compliance: Implement security best practices and ensure compliance requirements are met across all AI infrastructure Expectations for Success
Infrastructure uptime and reliability AI inference latency (p95, p99) and throughput metrics Infrastructure cost efficiency and optimization (cost per inference, GPU utilization) Time to deploy new models and workflows (deployment velocity) Developer satisfaction and productivity using AI infrastructure tools System observability and incident response time Responsibilities
Performance & Scalability Profile and optimize database queries, API endpoints, and ML inference pipelines Implement caching strategies, connection pooling, and distributed systems for scale Monitor and optimize GPU utilization, memory usage, and compute costs Design load balancing and auto-scaling policies for variable AI workloads Build disaster recovery systems with redundancy MLOps & Deployment Build and maintain CI/CD pipelines specifically for model deployment Implement model versioning, A/B testing infrastructure, and rollout mechanisms Create automated testing frameworks for model quality and performance regression Develop infrastructure for model monitoring, drift detection, and retraining workflows Manage experiment tracking and model registry systems Observability & Reliability Implement comprehensive monitoring, logging, and alerting across the AI stack Refine dashboards for real-time visibility into system health and performance Conduct post-mortems and implement reliability improvements Design circuit breakers, retry logic, and graceful degradation for critical services Security & Compliance Refine security best practices for AI infrastructure and data handling Ensure compliance with data privacy regulations and industry standards Manage credentials and access control across infrastructure Support security audits and vulnerability assessments Collaboration & Documentation Work closely with Product & Engineering team to understand infrastructure needs and to enable fast, safe feature deployment Document infrastructure architecture, runbooks, and operational procedures Mentor team members on infrastructure best practices and tooling Contribute to technical strategy and architectural decisions Requirements
Required Experience 3-7 years of experience in infrastructure engineering, DevOps, or SRE Strong cloud platform expertise Experience building and maintaining deployment pipelines Experience with PostgreSQL, Redis, or other production databases Experience with APM tools, metrics, logging, and alerting Familiarity with vector databases, model serving frameworks and cross-system observability and traceability Managing and optimizing GPU work Real-time inference with low-latency serving infrastructure LLM deployment Track record of achieving 10x performance improvements Skills & Competencies Able to debug complex distributed systems and find root causes Obsessed with latency, throughput, and resource efficiency Defaults to automating repetitive tasks and building scalable solutions Understands security implications and implements best practices Able to explain complex technical concepts clearly Works effectively across teams and functions Takes initiative to identify and solve problems before they become critical Comfortable with ambiguity and changing priorities in a fast-moving startup Supporting A/B deployment strategy Mindset & Approach Extreme ownership: Takes full responsibility for outcomes, not just inputs Customer-focused: Understands how infrastructure decisions impact end-user experience Data-driven: Makes decisions based on metrics and evidence Continuous learner: Stays current with evolving technologies and best practices Quality-focused: Builds systems that are reliable, maintainable, and elegant Velocity-minded: Balances speed with quality; ships incrementally and iterates Bar raiser: Sets and maintains high standards; elevates team performance and output quality Strong delegator: Empowers others effectively; distributes work based on strengths and growth opportunities Why You Should Join Our Team Our eNPS is +68! (Google it, that is great). Benefits: Medical, Dental, and Vision kick in day one. Unlimited PTO (with a requirement for employees to take a minimum of one continuous week per year). 401K with Company Match. Remote Flexible - come to the office when needed. Great parental leave benefits. Named on Inc 5000 list of America’s Fastest Growing Private Companies. Named on Inc 5000 Vet 100 Private Companies list multiple years in a row. Winner of Coastal Entrepreneur Award, Technology Category. Active employee-led Culture Committee. Ongoing industry and professional development trainings available to all employees. Multiple leaders on the executive committee recognized as 40 under 40 recipients for contributions to business and community. We’re playing offense to win! Our product market fit and our world-class employees make us the leader in our space. We\'re building something cool and people like it here. We receive many resumes for our open positions and each one is reviewed by a human being on our recruiting team. We will compare your background with the qualifications and requirements for the position. If you are selected for an interview you will receive an e-mail from someone on our recruiting team with an @vantaca.com email address. It may take some time for us to review all of the applications so give us some time to respond. We appreciate your interest in this role. Seniority level
Mid-Senior level Employment type
Full-time Job function
Information Technology Industries: Software Development Referrals increase your chances of interviewing at Vantaca by 2x Get notified about new Lead Infrastructure Engineer jobs in Wilmington, NC. We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
#J-18808-Ljbffr
The Lead AI Infrastructure Engineer at HOAi is responsible for scaling and maintaining the infrastructure that powers our AI-driven products and services. This role sits at the intersection of infrastructure engineering, machine learning operations, and product development, ensuring our AI systems operate with exceptional reliability, performance, and efficiency. The ideal candidate is someone who gets excited about making AI systems fundamentally faster and more scalable. You\'ll work directly with our engineering and product teams to build the foundational infrastructure that enables HOAi to deliver the most advanced AI product in the community association management industry. Accountability & Initiatives
Infrastructure Ownership: Design, build, and maintain the cloud architecture, model serving infrastructure, and ML pipelines that power HOAi\'s products Performance Optimization: Profile and optimize AI workloads to achieve sub-second inference latency while managing costs effectively Scalability & Reliability: Build auto-scaling systems, implement robust failover mechanisms, and ensure 99.99% uptime for mission-critical AI services MLOps Excellence: Develop and maintain CI/CD pipelines for model deployment, monitoring, and versioning across development and production environments Developer Enablement: Create tooling and infrastructure that allows product engineers to deploy AI features quickly and safely Security & Compliance: Implement security best practices and ensure compliance requirements are met across all AI infrastructure Expectations for Success
Infrastructure uptime and reliability AI inference latency (p95, p99) and throughput metrics Infrastructure cost efficiency and optimization (cost per inference, GPU utilization) Time to deploy new models and workflows (deployment velocity) Developer satisfaction and productivity using AI infrastructure tools System observability and incident response time Responsibilities
Performance & Scalability Profile and optimize database queries, API endpoints, and ML inference pipelines Implement caching strategies, connection pooling, and distributed systems for scale Monitor and optimize GPU utilization, memory usage, and compute costs Design load balancing and auto-scaling policies for variable AI workloads Build disaster recovery systems with redundancy MLOps & Deployment Build and maintain CI/CD pipelines specifically for model deployment Implement model versioning, A/B testing infrastructure, and rollout mechanisms Create automated testing frameworks for model quality and performance regression Develop infrastructure for model monitoring, drift detection, and retraining workflows Manage experiment tracking and model registry systems Observability & Reliability Implement comprehensive monitoring, logging, and alerting across the AI stack Refine dashboards for real-time visibility into system health and performance Conduct post-mortems and implement reliability improvements Design circuit breakers, retry logic, and graceful degradation for critical services Security & Compliance Refine security best practices for AI infrastructure and data handling Ensure compliance with data privacy regulations and industry standards Manage credentials and access control across infrastructure Support security audits and vulnerability assessments Collaboration & Documentation Work closely with Product & Engineering team to understand infrastructure needs and to enable fast, safe feature deployment Document infrastructure architecture, runbooks, and operational procedures Mentor team members on infrastructure best practices and tooling Contribute to technical strategy and architectural decisions Requirements
Required Experience 3-7 years of experience in infrastructure engineering, DevOps, or SRE Strong cloud platform expertise Experience building and maintaining deployment pipelines Experience with PostgreSQL, Redis, or other production databases Experience with APM tools, metrics, logging, and alerting Familiarity with vector databases, model serving frameworks and cross-system observability and traceability Managing and optimizing GPU work Real-time inference with low-latency serving infrastructure LLM deployment Track record of achieving 10x performance improvements Skills & Competencies Able to debug complex distributed systems and find root causes Obsessed with latency, throughput, and resource efficiency Defaults to automating repetitive tasks and building scalable solutions Understands security implications and implements best practices Able to explain complex technical concepts clearly Works effectively across teams and functions Takes initiative to identify and solve problems before they become critical Comfortable with ambiguity and changing priorities in a fast-moving startup Supporting A/B deployment strategy Mindset & Approach Extreme ownership: Takes full responsibility for outcomes, not just inputs Customer-focused: Understands how infrastructure decisions impact end-user experience Data-driven: Makes decisions based on metrics and evidence Continuous learner: Stays current with evolving technologies and best practices Quality-focused: Builds systems that are reliable, maintainable, and elegant Velocity-minded: Balances speed with quality; ships incrementally and iterates Bar raiser: Sets and maintains high standards; elevates team performance and output quality Strong delegator: Empowers others effectively; distributes work based on strengths and growth opportunities Why You Should Join Our Team Our eNPS is +68! (Google it, that is great). Benefits: Medical, Dental, and Vision kick in day one. Unlimited PTO (with a requirement for employees to take a minimum of one continuous week per year). 401K with Company Match. Remote Flexible - come to the office when needed. Great parental leave benefits. Named on Inc 5000 list of America’s Fastest Growing Private Companies. Named on Inc 5000 Vet 100 Private Companies list multiple years in a row. Winner of Coastal Entrepreneur Award, Technology Category. Active employee-led Culture Committee. Ongoing industry and professional development trainings available to all employees. Multiple leaders on the executive committee recognized as 40 under 40 recipients for contributions to business and community. We’re playing offense to win! Our product market fit and our world-class employees make us the leader in our space. We\'re building something cool and people like it here. We receive many resumes for our open positions and each one is reviewed by a human being on our recruiting team. We will compare your background with the qualifications and requirements for the position. If you are selected for an interview you will receive an e-mail from someone on our recruiting team with an @vantaca.com email address. It may take some time for us to review all of the applications so give us some time to respond. We appreciate your interest in this role. Seniority level
Mid-Senior level Employment type
Full-time Job function
Information Technology Industries: Software Development Referrals increase your chances of interviewing at Vantaca by 2x Get notified about new Lead Infrastructure Engineer jobs in Wilmington, NC. We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
#J-18808-Ljbffr