Corporation Service Company

Associate Architect

Corporation Service Company, Wilmington, Delaware, us, 19894

Overview

Title: Associate Architect Role: Site Reliability Engineer Location: Bangalore Work schedule: 11 am to 8 pm (Hybrid) The Site Reliability Engineer applies software engineering principles to IT operations, focusing on the development and automation of systems and processes to improve site reliability, scalability, and efficiency. This role involves balancing the need for rapid feature development with the requirement for highly stable and available production environments. Responsibilities

System Reliability & Automation:

Design, build, and maintain efficient and scalable systems through automation, reducing manual work and "toil." Develop and maintain CI/CD pipelines to ensure consistent, reliable, and fast software delivery. Plan, design, and execute configuration changes and rollouts both at the application and infrastructure levels.

Monitoring & Observability:

Define, measure, and report on key Service Level Indicators (SLIs), Service Level Objectives (SLOs), and Service Level Agreements (SLAs). Implement comprehensive monitoring, logging, and alerting solutions that focus on symptoms rather than causes. Utilize error budgets to balance the pace of feature development with system stability.

Incident Response & Management:

Participate in on-call rotation to respond to, troubleshoot, and mitigate production incidents and alerts. Conduct blameless post-mortem/Root Cause Analysis (RCA) reviews to identify the cause of incidents and implement preventative measures.

Capacity Planning & Performance:

Proactively monitor system performance, identify bottlenecks, and drive optimization efforts. Perform capacity planning to ensure the platform can scale to meet future user and traffic demands.

Collaboration & Mentorship:

Collaborate closely with development (Dev) teams to integrate operational and reliability best practices into the entire software development lifecycle (SDLC). Document systems, processes, and "runbooks" to share knowledge and facilitate smooth operations.

Required Qualifications

Bachelor's degree in Computer Science, Engineering, or a related technical field, or equivalent practical experience. Proficiency in at least one scripting or programming language (e.g., Python, Java, Bash). Experience with configuration management and infrastructure-as-code tools (e.g., Terraform, Ansible, Chef, Puppet). Solid understanding and experience with cloud computing platforms (e.g., AWS, Azure, OCI). Experience with containerization and orchestration technologies (e.g., Docker, Kubernetes). Familiarity with monitoring and alerting tools (e.g., Elastic, Grafana, Splunk, Nagios). Strong knowledge of Linux operating systems, networking, and distributed systems. Preferred Skills

Previous experience in an SRE, DevOps, or highly-automated Systems Engineering role. Experience with large-scale data systems or database administration. Demonstrated ability to debug and optimize code and automate infrastructure. Excellent written and verbal communication skills, including the ability to explain complex technical concepts to non-technical audiences.

#J-18808-Ljbffr