Soal Technologies Inc
Only W2 Candidates Only US Citizen / Green Card Job Description: We are looking for a highly qualified and motivated AI-Ops Engineer. Responsibilities: AI-Driven Operations & Automation • Implement AIOps solutions using ML to automate performance monitoring, workload scheduling, and infrastructure operations. • Build anomaly detection systems to identify system issues before they impact users. • Develop automated root cause analysis using ML-driven event correlation. • Create predictive maintenance workflows based on historic patterns and telemetry data. • Design and execute automated remediation scripts for incident response Observability & Intelligent Monitoring • Build observability platforms that aggregate logs, metrics, and events into unified dashboards. • Implement intelligent alerting using NLP/ML to reduce noise and prioritize actionable insights. • Deploy APM tools integrated with AI-powered analytics. • Ensure full visibility across cloud infrastructure, applications, and ML workloads Cloud Infrastructure & DevOps • Design and maintain scalable AWS infrastructure using CloudFormation, Terraform, or CDK. • Build and manage containerized workloads (Docker, ECS, Fargate, EKS). • Create CI/CD pipelines incorporating AI-driven deployment and quality checks. • Automate cloud operations to optimize cost, scalability, and reliability. • Ensure all cloud architecture meets Stanford’s compliance requirements (FERPA, GDPR) Collaboration & Continuous Improvement • Partner with engineers and cross-functional teams to deliver AIOps capabilities. • Use Git-based workflows and participate in code reviews. • Document runbooks, automation workflows, and operational procedures. • Continuously evaluate emerging AIOps tools and methodologies. • Contribute to building a culture focused on predictive and automated operations. Qualifications Required: • Bachelor’s degree in Computer Science, DevOps, Cloud Engineering, or related field (Master’s preferred). • 3+ years in DevOps, SRE, or Cloud Engineering roles. • 2+ years hands-on experience with AWS (EC2, Lambda, ECS/Fargate, S3, IAM, VPC). • Strong Python programming skills. • Experience implementing monitoring and observability solutions at scale. • Familiarity with ML/AI concepts applied to automation. Technical Skills: • Languages: Python required; Bash, Go, or TypeScript preferred. • Monitoring Tools: CloudWatch, X-Ray, Prometheus, Grafana, Datadog, Splunk. • Infrastructure as Code: CloudFormation, Terraform, CDK. • Containers & Orchestration: Docker, ECS/Fargate, Kubernetes (EKS). • AWS Services: Lambda, EC2, S3, API Gateway, EventBridge, CloudWatch, IAM, CodePipeline, SageMaker. • CI/CD: GitHub Actions, CodePipeline, Jenkins, GitLab CI. • Data & Analytics: Log aggregation, metrics analysis, event correlation. Desired Attributes: • Strong understanding of AIOps principles and automation-first operations. • Passion for eliminating manual work through AI-driven solutions. • Excellent debugging and root cause analysis skills. • Adaptable, collaborative, and eager to learn with strong communication skills. • Thrives in fast-paced environments with evolving technology stacks. --- Hamza Jawed | Technical Recruiter C) 737.301.0255 hjawed@soaltech.com