Logo
CFX

Site Reliability Engineer (SRE)

CFX, Jackson, Mississippi, United States

Save Job

Analyze Business/Product requirements and propose effective and efficient technical solutions in delivering changes and innovations to the Exchange infrastructure and landscape Work with a project focus group (product engineering, product management, architecture, and CTO) to compile a work breakdown structure of tasks for given deliverables and provide realistic estimates for completion or project assignments Design, build, maintain and improve Exchange infrastructure and respective tooling. Ensure infrastructure elasticity and automated scalability for cost-efficiency in resources utilization while ensuring the system’s high availability and fault tolerance Collaborate with other Developers, SREs, and QA Engineers to execute full-cycle integration, functional, and regression testing. Own and resolve all priority defects identified within the solution codebase efficiently and in a timely fashion Promote software changes across all environments, safely and responsibly, through Development, Staging environments to deploying updates to the Production environment in a zero-downtime manner Provide effective infrastructure Level 1 technical support during business and, occasionally, off hours depending on a rotation schedule. Design, build, maintain and improve the respective infrastructure monitoring tooling that is critical for both:

momentum situational as is wareness and pro-active incident response future infrastructure capacity planning activities

Participate in team exercises to identify and implement areas for continuous improvement, and be proactive in bringing your ideas across Educate and mentor your engineering colleagues in the areas of your own expertise and domain knowledge, and be open-minded and approachable Requirements

5+ years of SRE experience, ideally working with one of big cloud vendor: Amazon Web Services, Google Cloud, MS. Azure, etc. Experience in designing and implementing AWS and/or GCP setup from scratch Experience in architecting, building, deploying, and operating enterprise-ready container solutions on Kubernetes Solid experience in setting up and maintaining message broker infrastructure (Kafka, RocketMQ, etc.) Experience in setting up Cloud Persistence layer (AWS Aurora, GCP BigQuery, etc.) Experience implementing large Service mesh via Istio or any other relevant solution Experience building on-demand, short-lived environments (for debugging, profiling, and load-testing scenarios) Experience with operating systems, especially good knowledge of the Linux operating system and understanding of network architectures

#J-18808-Ljbffr