Logo
TechDigital Group

Senior Technical Lead - DevOps

TechDigital Group, Bellevue, Washington, us, 98009

Save Job

- Provide consulting services for improved system stability, availability, performance and reliability. - Assist in determining the impact of operational issues and provide input into their resolution via data extraction and quantification. - Work through day-to-day support issues, ensure effective and timely resolution of issues in production environment, troubleshoot customer impacting issues. - Support multiple applications, specifically running Kubernetes, Gloo, AWS, Apigee, PCF, GCP/Java based systems in an enterprise environment. - Support Gloo running on Kubernetes, Apigee opdk and saas, Grafana, Prometheus, Cassandra, Postgres, Spring Boot or Java based applications running on Kubernetes, PCF, and Java application servers. - Apply GitOps principles to manage infrastructure and application configurations. - Apply monitoring and create complex alerts and dashboards for production systems. - Provide capacity analysis and tuning analysis for Apigee and Java applications hosted on LINUX and container platform. - Available to provide 24X7 on-call support on a rotating basis with other team members. - Lead efforts in troubleshooting, recovery, and root cause investigation. - Perform analysis of user requirements and problems to automate or improve systems and review system capabilities, workflow, and scheduling limitations. - Able to follow and develop detailed work plans, schedules, project estimates, resource plans, and status reports. - Facilitate HA (High Availability) / DR (Disaster Recovery) exercises to ensure that the team is fully prepared for any event. - Lead root cause analysis sessions to understand what causes issues in Production and come up with RCA Report along with solutions that will prevent them from happening in the future. - Ensure documentation is created and remains updated for any related work. - Strong understanding of UNIX operating systems and any scripting language. - Forecast and plan for a rapidly growing environment. - Evaluate new software product and service solutions. Skill Requirements:

Expertise in analyzing and troubleshooting large-scale distributed systems. Strong experience with Kubernetes Container Orchestration Tool, Gloo, AWS, Apigee API Gateway. Experience with REST, SOAP, and GraphQL API support. Experience with tools like: Git, Gitlab, Docker, Postman, Splunk, App Dynamics, Imperva WAF and CI/CD tools. Good experience in GitOps process, performance measurement tuning, capacity planning and management, contingency, and disaster recovery. Good understanding and strong experience with Unix/Linux operating systems. Ability to debug, optimize code, and automate routine tasks. Systematic problem-solving approach coupled with effective communication skills. Strong scripting knowledge and experience. Good understanding of networking, routing, and TLS/SSL.

#J-18808-Ljbffr