Straiker

Dev Ops Engineer / Senior DevOps Engineer

Straiker, Sunnyvale, California, United States, 94087

Get AI-powered advice on this job and more exclusive features.

Your talent is powerful, protect it wisely with us!

AI is the biggest technology shift of our lifetime as it reshapes how people work and play! At

Straiker , our mission is to ensure businesses can embrace AI with confidence. Founded by seasoned AI & Security entrepreneurs and backed by premier venture partners like Lightspeed and Bain capital, we’re building highly fine tuned AI technology to protect enterprises against the next-generation of AI threats. If you are a dreamer, self-starter, high IQ - low ego individual and want to join us in our mission to secure the future with AI, Straiker is the right place for you. Are you in?

Location:

SF Bay Area

Job Type:

Full-Time

Job Description Straiker is an AI startup backed by top Silicon Valley VCs with a mission to help enterprises embrace Gen AI by providing a layer of security, safety and trust. At Straiker we will use AI to secure AI. As a DevOps Engineer at Straiker, you will be instrumental in building and maintaining the infrastructure that powers our AI detection cloud platform. You will architect, implement, and optimize our deployment pipelines, cloud infrastructure, and operational systems to ensure high availability, scalability, and security of our AI services.

Key Responsibilities

Infrastructure as Code (IaC):

Design, implement, and maintain infrastructure using tools like Terraform, CloudFormation, or Pulumi to ensure reproducible and scalable environments across development, staging, and production.

CI/CD Pipeline Management:

Build, optimize, and maintain continuous integration and deployment pipelines using tools like Jenkins, GitLab CI, GitHub Actions, or CircleCI to enable rapid and reliable software delivery.

Container Orchestration:

Deploy, manage, and scale containerized applications using Kubernetes, including cluster management, service mesh implementation, and optimization of container workloads.

Cloud Architecture:

Design and implement cloud-native solutions on AWS, Azure, or Google Cloud, including auto-scaling, load balancing, and disaster recovery strategies.

Monitoring & Observability:

Implement comprehensive monitoring, logging, and alerting solutions using tools like Prometheus, Grafana, ELK stack, or Datadog to ensure system health and performance.

Security & Compliance:

Implement security best practices, manage secrets and credentials, ensure compliance with industry standards, and conduct regular security audits of infrastructure.

Automation & Scripting:

Develop automation scripts and tools using Python, Bash, or Go to streamline operations, reduce manual tasks, and improve system reliability.

AI/ML Infrastructure:

Build and maintain specialized infrastructure for AI model training, fine-tuning, and deployment, including GPU cluster management and ML pipeline optimization.

Performance Optimization:

Analyze and optimize system performance, implement caching strategies, and ensure efficient resource utilization across all environments.

Incident Response:

Lead incident response efforts, perform root cause analysis, and implement preventive measures to minimize downtime and service disruptions.

Collaboration:

Work closely with software engineers, ML engineers, and security teams to ensure seamless integration of DevOps practices throughout the development lifecycle.

Documentation:

Create and maintain comprehensive documentation for infrastructure, deployment processes, and operational procedures.

Qualifications

Bachelor's or Master's degree in Computer Science, Engineering, or related field

3-6 years of experience in DevOps, Site Reliability Engineering, or Infrastructure Engineering roles. (2+ years may be sufficient with a Master’s degree).

Strong expertise in cloud platforms (AWS, Azure, or GCP) with relevant certifications preferred

Proficiency in Infrastructure as Code tools (Terraform, CloudFormation, Ansible)

Extensive experience with containerization (Docker) and orchestration (Kubernetes)

Strong scripting skills in Python, Bash, or Go

Experience with CI/CD tools and GitOps practices

Solid understanding of networking, security, and Linux system administration

Experience with monitoring and observability tools (Prometheus, Grafana, ELK, Datadog)

Strong problem-solving skills and ability to work in a fast-paced startup environment

Excellent communication skills and ability to work effectively with cross-functional teams

Preferred Skills

Experience with AI/ML infrastructure and MLOps practices

Knowledge of service mesh technologies (Istio, Linkerd)

Experience with serverless architectures and event-driven systems

Familiarity with database administration (both SQL and NoSQL)

Experience with message queuing systems (Kafka, RabbitMQ, SQS)

Understanding of FinOps practices and cloud cost optimization

Experience with compliance frameworks (SOC2, HIPAA, GDPR)

Knowledge of chaos engineering and resilience testing practices

Contributions to open-source DevOps tools or infrastructure projects is a plus.

Seniority Level Mid-Senior level

Employment Type Full-time

Job Function Software Development

#J-18808-Ljbffr