Optomi
About the role
NO C2C or 3rd party sponsorship available. 5x a week on site in clients Alpharetta GA or Berkeley Heights NJ offices. Year-long contract with potential conversion/extension. Responsibilities
Design, build, and maintain highly available, scalable, and secure cloud infrastructure on platforms such as AWS, GCP, or Azure. Develop and implement automation for provisioning, monitoring, scaling, and incident response using Infrastructure-as-Code tools (e.g., Terraform, CloudFormation, Ansible). Monitor system reliability, capacity, and performance; proactively detect and address issues before they impact users. Respond to production incidents, participate in on-call rotations, and lead post-incident reviews to drive root cause analysis and reliability improvements. Collaborate with software engineering and security teams to ensure new services and features are production-ready and meet reliability standards. Build and maintain tools for deployment, monitoring, and operations; automate manual processes to reduce toil. Document operational processes and system architectures to ensure knowledge sharing and repeatability. Continuously evaluate and implement new technologies to improve system reliability, security, and efficiency. Qualifications
Bachelor’s degree in Computer Science, Engineering, or a related technical field, or equivalent practical experience. 3+ years of experience in software development with proficiency in at least one programming language (e.g., Python, Go, Java, C++). Experience administering cloud platforms (AWS, GCP, Azure), including networking, security, containerization, storage, data management, and serverless technologies. Solid understanding of Linux systems, networking fundamentals, virtualized and distributed systems, file systems, system processes and configurations. Deep understanding of observability (monitoring, alerting, and logging) tools in cloud environments. Ability to set up and maintain monitoring dashboards, alerts, and logs. Familiarity with CI/CD tools for automated testing, deployments, provisioning, and observability. Ability to manage and respond to incidents, perform root cause analysis, and implement post-mortem reviews. Understanding of setting, monitoring, and maintaining SLOs and SLAs for system reliability. Additional qualifications
Experience working with enterprise-scale financial services or other regulated industries 5+ years of experience in SRE, DevOps, infrastructure, or cloud engineering roles, preferably supporting large-scale, distributed systems Excellent problem-solving, troubleshooting, and communication skills Experience leading technical projects or mentoring junior engineers Certifications: Certified Engineer, DevOps, SRE, CSREF Seniority level
Mid-Senior level Employment type
Full-time Job function
Not specified in this description. Note: The content includes role details, responsibilities, qualifications, and work environment, and omits irrelevant site notices and tracking information.
#J-18808-Ljbffr
NO C2C or 3rd party sponsorship available. 5x a week on site in clients Alpharetta GA or Berkeley Heights NJ offices. Year-long contract with potential conversion/extension. Responsibilities
Design, build, and maintain highly available, scalable, and secure cloud infrastructure on platforms such as AWS, GCP, or Azure. Develop and implement automation for provisioning, monitoring, scaling, and incident response using Infrastructure-as-Code tools (e.g., Terraform, CloudFormation, Ansible). Monitor system reliability, capacity, and performance; proactively detect and address issues before they impact users. Respond to production incidents, participate in on-call rotations, and lead post-incident reviews to drive root cause analysis and reliability improvements. Collaborate with software engineering and security teams to ensure new services and features are production-ready and meet reliability standards. Build and maintain tools for deployment, monitoring, and operations; automate manual processes to reduce toil. Document operational processes and system architectures to ensure knowledge sharing and repeatability. Continuously evaluate and implement new technologies to improve system reliability, security, and efficiency. Qualifications
Bachelor’s degree in Computer Science, Engineering, or a related technical field, or equivalent practical experience. 3+ years of experience in software development with proficiency in at least one programming language (e.g., Python, Go, Java, C++). Experience administering cloud platforms (AWS, GCP, Azure), including networking, security, containerization, storage, data management, and serverless technologies. Solid understanding of Linux systems, networking fundamentals, virtualized and distributed systems, file systems, system processes and configurations. Deep understanding of observability (monitoring, alerting, and logging) tools in cloud environments. Ability to set up and maintain monitoring dashboards, alerts, and logs. Familiarity with CI/CD tools for automated testing, deployments, provisioning, and observability. Ability to manage and respond to incidents, perform root cause analysis, and implement post-mortem reviews. Understanding of setting, monitoring, and maintaining SLOs and SLAs for system reliability. Additional qualifications
Experience working with enterprise-scale financial services or other regulated industries 5+ years of experience in SRE, DevOps, infrastructure, or cloud engineering roles, preferably supporting large-scale, distributed systems Excellent problem-solving, troubleshooting, and communication skills Experience leading technical projects or mentoring junior engineers Certifications: Certified Engineer, DevOps, SRE, CSREF Seniority level
Mid-Senior level Employment type
Full-time Job function
Not specified in this description. Note: The content includes role details, responsibilities, qualifications, and work environment, and omits irrelevant site notices and tracking information.
#J-18808-Ljbffr