Logo
TalentoHC

Site Reliability Engineer

TalentoHC, Miami, Florida, us, 33222

Save Job

Site Reliability Engineer On site in Miami Contract

Talento has partnered with an enterprise organization on a search for an SRE Engineer based in Miami, FL. The Site Reliability Engineer (SRE) ensures the availability, performance, security, and reliability of the organization’s infrastructure and applications. This role focuses on building automation, improving system resilience, managing observability platforms, and supporting incident response processes. The SRE works closely with infrastructure, security, and application teams to maintain a highly scalable and stable environment.

Requirements

5+ years of experience in Site Reliability Engineering, DevOps, or related infrastructure roles

Strong knowledge of cloud platforms (AWS, Azure, or GCP)

Proficiency in automation and scripting (Python, Bash, PowerShell, etc.)

Experience with CI/CD pipelines and infrastructure-as-code tools (Terraform, CloudFormation, etc.)

Hands‑on experience with observability/monitoring tools (Datadog, Prometheus, Grafana, Splunk, etc.)

Understanding of networking, security best practices, and system performance tuning

Experience production environments and participating in incident response

Strong troubleshooting skills and ability to diagnose complex system issues

Responsibilities

Improve system uptime, resilience, performance, and overall security posture

Develop automation for deployments, monitoring, alerting, and infrastructure scaling

Manage observability platforms, including dashboards, alerts, and log pipelines

Lead and support incident response workflows to restore services quickly and prevent recurrences

Implement best practices for reliability engineering, performance optimization, and security hardening

Collaborate with infrastructure, cybersecurity, and application teams to ensure reliable system integrations

Maintain documentation, runbooks, and operational standards for system reliability

Continuously evaluate and implement tools to enhance performance, automation, and monitoring

#J-18808-Ljbffr