SpringbokIT
We're seeking a skilled and proactive individual to help maintain and improve the stability, availability, and efficiency of our systems. In this role, you'll collaborate closely with both development and operations teams to enhance our infrastructure, support application delivery, and optimize for cost and performance.
Key Responsibilities:
Contribute to designing and deploying scalable, dependable systems using Kubernetes, Docker, and Istio
Analyze system performance and recommend optimizations for responsiveness, uptime, and throughput
Monitor production environments and manage incidents using observability tools like Datadog
Write automation scripts to streamline deployment, monitoring, and infrastructure management
Apply GitOps practices to ensure reliable, traceable production deployments
Work with engineers to identify and troubleshoot system reliability issues
Perform load testing to confirm capacity for upcoming product changes or launches
Implement progressive deployment strategies such as A/B testing, canary releases, and traffic mirroring
Support high-volume systems on AWS, including EKS clusters, load balancing, and network routing
Maintain high service availability and user experience while optimizing cloud spend
Participate in global on-call rotations to support production reliability
Create and maintain internal documentation and promote knowledge sharing
Assist in applying best practices for system resiliency and operational excellence
Qualifications:
2+ years in SRE, DevOps, or infrastructure-related roles
Working knowledge of AWS services
Hands-on experience with containerization and orchestration (Kubernetes, Docker, Istio)
Familiar with observability platforms like Datadog, Prometheus, Grafana, AppDynamics, or ELK
Understanding of auto-scaling using Horizontal Pod Autoscalers (HPAs)
Experience with GitOps tools such as Argo CD
Familiarity with deployment techniques like blue/green, canary, and traffic splitting
Proficiency with infrastructure-as-code and automation tools (e.g., Terraform, Ansible)
Awareness of cloud cost optimization principles
Strong analytical and troubleshooting skills
Adaptability in learning and applying new tools and technologies
Self-motivated, detail-oriented, and accountable
Excellent collaboration and communication abilities
Upholds high standards for work quality and integrity
Experience with Golang or Rust is a plus, but not mandatory
Job Details
Seniority level: Mid-Senior level Employment type: Contract Job function: Information Technology Industries: Technology, Information and Media
#J-18808-Ljbffr
Seniority level: Mid-Senior level Employment type: Contract Job function: Information Technology Industries: Technology, Information and Media
#J-18808-Ljbffr