Next Ventures
Job Description:
We are hiring a Senior Site Reliability/DevOps Engineer to drive the reliability, scalability, and security of our financial platforms. This role is ideal for a seasoned engineer with deep experience in automating infrastructure, optimizing deployments, and building fault-tolerant systems in a regulated, high-stakes environment.
Key Responsibilities:
Lead the design and implementation of resilient, scalable infrastructure using Infrastructure as Code (Terraform, CloudFormation, etc.)
Own and optimize CI/CD pipelines and deployment strategies
Proactively monitor, troubleshoot, and resolve system issues to minimize downtime
Develop and maintain comprehensive observability solutions—logging, metrics, tracing, and alerting—to ensure full visibility into system performance and reliability
Support and optimize AWS EMR clusters for data processing workloads, ensuring stability, cost-efficiency, and integration with data pipelines
Champion automation and DevOps best practices across teams
Collaborate with security and compliance teams to meet regulatory requirements
Mentor junior engineers and contribute to architectural decisions
Requirements:
8+ years in SRE, DevOps or infrastructure engineering roles
Expert-level knowledge of AWS (including EMR), Kubernetes, and Linux systems
Strong experience with Docker, Terraform, CI/CD tools (e.g., Jenkins, GitLab CI), and scripting (Python, Bash)
Proven track record managing mission-critical systems in financial or similarly regulated industries
Deep understanding of observability tools and practices (e.g., Prometheus, Grafana, ELK, OpenTelemetry)
Hands-on experience deploying, tuning, and managing AWS EMR clusters in production environments
Preferred:
Experience with SOC2, PCI, or other compliance frameworks
Relevant certifications (AWS, Kubernetes, etc.)
#J-18808-Ljbffr
#J-18808-Ljbffr