Logo
Talent Space

Sr. DevOps Engineer - AI

Talent Space, Thousand Oaks, California, United States, 91362

Save Job

We're looking for a skilled

Cloud DevOps Engineer

with a strong background in cloud infrastructure, automation, and DevOps for a contract to hire opportuntiy in Thousand Oaks, CA!

What You'll Do:

Design, implement, and manage scalable, reliable infrastructure in

AWS

to support high-availability systems. Build and maintain

Windows and Linux server environments , ensuring seamless integration across hybrid and cloud platforms. Use

Infrastructure as Code (IaC)

tools like

AWS CloudFormation, CDK, Terraform, and OpenTofu

to automate infrastructure provisioning and configuration. Implement

configuration management

using

Chef

to maintain consistency across Windows and Linux systems. Design and optimize

CI/CD pipelines

using

GitLab CI/CD , supporting automated deployment workflows for

.NET applications . Integrate

generative AI services

such as

AWS Bedrock ,

Google Agentspace , and similar tools into the platform to support scalable and secure AI delivery. Develop infrastructure to support large-scale

AI/ML pipelines

for model training, inference, and deployment across

AWS and GCP

environments. Automate the full AI/ML model lifecycle-training, deployment, monitoring-ensuring reproducibility and smooth collaboration between data science and engineering teams. Partner with AI engineers to deliver reusable APIs, scalable infrastructure, and tools that accelerate innovation and adoption of machine learning across the organization. Implement robust

observability ,

cost management , and

security/privacy

strategies tailored to AI workloads, including resource-efficient scaling and monitoring of inference services. Ensure

infrastructure and deployment security

aligns with best practices and compliance standards. Collaborate with software engineering teams to understand their needs and provide

DevOps solutions

that improve velocity and reliability. Troubleshoot and resolve infrastructure or deployment issues across environments. Deploy and manage

monitoring and logging systems

to ensure real-time visibility and proactive issue detection. Contribute to the creation and documentation of internal

DevOps best practices , standards, and tooling. Stay current with evolving trends in

cloud infrastructure, DevOps automation, and AI platform engineering . Offer

mentorship and support to junior team members , helping grow the team's technical capabilities.

Qualifications:

Qualifications: Bachelor's degree in

Computer Science ,

Engineering , or a related field - or equivalent practical experience. 5+ years of professional experience

in a

DevOps

or

Site Reliability Engineering (SRE)

role, with a strong track record of supporting production systems. At least

1 year of experience working with AI services and large language models (LLMs) , including integration and orchestration. Extensive hands-on experience with

Amazon Web Services (AWS)

and cloud-native architectures. Solid knowledge of both

Windows and Linux server administration , including experience integrating these environments within cloud platforms. Proven expertise with

Infrastructure as Code (IaC)

tools, particularly

AWS CDK

and

Terraform . Strong experience building and managing

CI/CD pipelines , especially using

GitLab CI/CD . Experience deploying and maintaining

.NET applications

in cloud environments. Deep understanding of

cloud security best practices

and how to implement them across infrastructure and CI/CD workflows. Solid grasp of

networking fundamentals , including

TCP/IP ,

DNS ,

load balancing , and

firewall configuration

in cloud-based systems. Hands-on experience with

monitoring and logging tools

such as

New Relic ,

AWS CloudWatch , or similar platforms. Strong scripting skills in languages such as

PowerShell ,

Python ,

Ruby , or

Bash . Excellent

problem-solving abilities

and the capacity to troubleshoot complex systems efficiently. Strong

communication and collaboration skills , with the ability to work effectively across teams and departments. Experience with

containerization technologies

like

Docker

and

Kubernetes

is a plus. AWS and/or GCP certifications

are a strong advantage. Familiarity with

Chef

for configuration management is preferred.