Logo
Apolis

Cloud Ops Lead

Apolis, New York, New York, United States, 10019

Save Job

Cloud Ops Lead

Location: New York, New York

JOB DESCRIPTION:

Architect AWS Solutions - Infrastructure Services

Experience: 12+ Years

Primary Skills: AWS Cloud Solutions and Operations, Architecting and Automation of Cloud Infrastructure, AI/ML Integration

Secondary Skills: Good experience in automation scripts such as PowerShell, AWS CLI, Python, JSON, and familiarity with AI/ML frameworks and tools.

Responsibilities:

* Architect, design, and implement scalable, secure, and cost-optimized cloud solutions leveraging the latest AWS offerings, including generative AI services (e.g., Amazon Bedrock, SageMaker, Amazon Q, CodeWhisperer).

* Deep, hands-on technical expertise in AWS, including new services such as Lambda SnapStart, Graviton-based compute, and advanced analytics (e.g., Amazon QuickSight, Redshift Serverless).

* Support and administration of AWS IaaS and PaaS, including RDS, Athena, DynamoDB, EFS, ElastiCache, Kinesis Firehose, S3, Route53, SNS, Lambda, Data Pipeline, and new AI/ML services.

* Implement and manage monitoring, analytics, and optimization tools (AWS CloudWatch, AWS CloudTrail, AWS Cost Explorer, and AI-driven observability tools).

* Operational understanding of securing cloud instances, including AI-powered security tools (e.g., Amazon GuardDuty, Macie, Inspector).

* Expertise in Linux administration, AWS VPC, subnet management, and troubleshooting.

* Develop and maintain cloud automation scripts/code (Perl, JSON, PowerShell, Terraform, AWS CloudFormation, CDK), including integration with AI/ML pipelines.

* Identify and resolve cloud performance bottlenecks using architectural and AI-driven performance analytics.

* Analyze AWS CloudTrail logs and aggregated log files for advanced troubleshooting, leveraging AI-based log analysis where appropriate.

* Identify and implement cost-saving strategies using AWS's latest cost management and AI-powered optimization tools.

* Understanding of Cloud Platform Engineering, SRE, and AI/ML Ops best practices.

* Good knowledge of application build/release processes, CI/CD pipelines (Jenkins, Chef/Puppet, AWS CodePipeline, CodeBuild, and integration with AI-powered DevOps tools).

* Familiarity with Agile processes and ability to collaborate with cross-functional teams (Development, Infrastructure, Security, Testing, QA, and Data Science/AI teams).

* Analyze customer business and technical requirements, assess environments for cloud and AI enablement, and advise on cloud and AI/ML solutions and risk management.

* Participate in customer cloud and AI/ML pre-sales responses and projects.

* Strong passion for technology exploration, AI/ML development, and continuous learning.

* Excellent written and verbal communication, presentation, and collaboration skills.

* Team leadership skills.

NICE TO HAVE

Assist with backups and recovery, including AI-driven backup optimization.

Deploy applications, performance tuning, troubleshooting, maintain security, and automate routine procedures through scripting and AI-based automation.

Integrate 3rd party tools, APIs, and AI/ML services in AWS environments.

Knowledge of ServiceNow and its integration with AWS and AI/ML workflows is recommended