Logo
Itcharm

DevOps Engineer (AWS)

Itcharm, WorkFromHome

Save Job

About Us and the Role is a fast-growing software development company delivering innovative solutions to clients around the world, with a strong focus on the US market. Headquartered in sunny San Diego, California, we’ve built a team of over 300 talented professionals across multiple countries. Our success comes from a deep commitment to results and long‑term partnerships built on trust. We’re launching a transformative partnership with a new US‑based client in the data quality and enrichment space who is embarking on a bold modernization initiative. This client is transitioning from a fragmented legacy ecosystem to a fully cloud‑native, scalable platform built on AWS. The transformation is not just technical – it’s strategic, touching every layer of infrastructure, product delivery, and operational efficiency. As a DevOps Engineer, you’ll be at the center of this journey, helping architect the systems and practices that will support long‑term growth, automation, and innovation. You’ll play a critical role in designing and implementing the infrastructure and DevOps strategy that enables this transformation. This includes solving deep‑rooted challenges around scalability, release orchestration, cost optimization, and environment consistency – while laying the foundation for a secure, observable, and automated cloud environment. This is a hands‑on role with strategic influence, ideal for someone who thrives in high‑autonomy environments and enjoys solving complex infrastructure challenges.

Tech Stack Snapshot

  • Cloud Provider: AWS
  • Compute & Orchestration: ECS, Fargate, Elastic Beanstalk, Lambda, EC2 (legacy)
  • Storage & Data: S3, RDS (PostgreSQL), Aurora
  • IaC: Terraform (selective use, expanding)
  • Containers & Configuration: ECR, AppConfig
  • Messaging & Coordination: SNS, SQS
  • Monitoring & Logging: CloudWatch, RDS Console
  • CI/CD: GitHub Actions, manual deployments (transitioning to full automation)

Key Responsibilities

Infrastructure & Environment Management

  • Design and provision isolated environments for development, QA, staging, and production using AWS best practices.
  • Standardize infrastructure provisioning using Terraform, ensuring consistency and version control across services.
  • Improve IAM role management and automate access provisioning to support secure and flexible operations.
  • Define ownership and review protocols for infrastructure changes and environment templates.

CI/CD & Deployment Automation

  • Architect and implement unified CI/CD pipelines across a diverse service landscape using GitHub Actions.
  • Integrate automated testing, linting, security scanning, and deployment validation into every pipeline.
  • Formalize and enforce a branching strategy to improve release management, collaboration, and CI/CD stability.
  • Introduce rollback strategies (e.g., blue/green or canary deployments) to support safe and resilient releases.
  • Establish isolated QA environments to support pre‑production testing and reduce deployment risk.

Cloud Architecture & Scalability Engineering

  • Implement auto‑scaling policies based on real‑time load metrics using ECS, Fargate, and Lambda.
  • Conduct performance simulations to validate scaling behavior and forecast capacity needs.
  • Address architectural bottlenecks, particularly in the database layer (e.g., write throughput, replication latency).
  • Define and monitor non‑functional scalability requirements (NFRs) and continuously improve system responsiveness.

Monitoring, Observability & Incident Response

  • Introduce full‑stack observability tools (e.g., Datadog, AWS X‑Ray, New Relic) for distributed tracing and performance insights.
  • Implement centralized logging using ELK stack or CloudWatch Logs Insights.
  • Define and monitor SLOs/SLIs for critical services and set up alerting and dashboards for ingestion pipelines and APIs.
  • Participate in incident response and postmortems, driving continuous improvement in system reliability and recovery.

Security & Compliance

  • Enforce secure configuration management and secrets handling across environments.
  • Support SoC2 and GDPR compliance efforts through infrastructure‑level controls and audit readiness.
  • Evaluate and implement network segmentation, VPC isolation, and firewall rules to enforce boundaries and reduce risk.

Cloud Cost Optimization & Governance

  • Audit AWS usage and apply cost optimization techniques including reserved instances, savings plans, and storage tiering.
  • Implement lifecycle policies for S3 and other storage services to manage data retention and cost.
  • Build dashboards to visualize resource usage, monitor cost trends, and align infrastructure decisions with budget targets.

Collaboration & Enablement

  • Work closely with engineering teams to unblock delivery and improve deployment confidence.
  • Support QA and product teams by stabilizing environments and improving release visibility.
  • Champion DevOps best practices across teams and foster a culture of automation, ownership, and continuous delivery.

Required Qualification

  • 5+ years of experience in DevOps, Site Reliability Engineering, or Cloud Infrastructure roles, ideally within high‑availability, distributed systems.
  • Deep expertise in AWS, including hands‑on experience with ECS, Lambda, RDS, S3, IAM, and messaging services like SNS/SQS.
  • Proficiency in Terraform for infrastructure provisioning and lifecycle management, with a strong understanding of version control and CI/CD integration.
  • Hands‑on experience with GitHub Actions, including pipeline design, deployment automation, and quality gate implementation.
  • Strong grasp of CI/CD principles, branching strategies, and deployment methodologies (e.g., blue/green, canary).
  • Solid understanding of containerization and orchestration using Docker, Fargate, and EKS, with experience managing container images via ECR.
  • Familiarity with monitoring and observability tools, such as CloudWatch, Datadog, ELK stack, and AWS X‑Ray, with the ability to define and track SLOs/SLIs.
  • Working knowledge of security best practices, including secrets management, IAM policies, and infrastructure controls aligned with SoC2 and GDPR compliance.
  • Excellent communication and collaboration skills in English, with the ability to work cross‑functionally and support engineering, QA, and product teams.

Preferred Qualifications

  • Experience with PostgreSQL performance tuning, replication strategies, and backup automation.
  • Exposure to data pipelines and ETL workflows, especially in batch or event‑driven architectures.
  • Familiarity with cloud cost optimization techniques, including reserved instances, storage tiering, and usage forecasting.
  • Experience supporting QA automation, test environments, and release validation workflows.

Why choose us?

  • We’re a company where curiosity fuels everything – from new ideas to personal growth.
  • We believe that common sense sits at the heart of everything we do. It helps us stay focused, make smart decisions, and move fast without the noise.
  • We trust each other to follow through and own the outcome, no micromanagement, just real commitment.
  • You’ll be heard here, even when you challenge the status quo. Because courage matters. Who dares — wins.

#J-18808-Ljbffr