Logo
Tata Consultancy Services

Technical Lead

Tata Consultancy Services, Dallas, Texas, United States, 75215

Save Job

Site reliability engineering-Senior Engineer

Must have skills:

Python or Java Splunk Cloud, ThousandEyes cloud platforms such as AWS, Google Cloud, or Azure Docker and Kubernetes

Responsibilities: • System Reliability : Work with production support teams to implement scalable, maintainable systems, continuously seeking improvements and optimizations in infrastructure and application architecture. • Toil Reduction - Automation : Build and maintain tools and scripts for automating repetitive tasks, deployment processes, monitoring, and incident responses, reducing manual interventions and minimizing human errors. • Incident Management : Participate in major incidents (on-call rotations), respond to incidents and service outages, promptly investigate and resolve system issues, and conduct post-mortems to prevent future incidents through Problem management. • Monitoring and Alerting : Establish and maintain monitoring and alerting systems to proactively identify potential issues, ensuring timely notifications to relevant teams during critical situations. • Capacity Planning and Performance Optimization : Monitor system performance, identify bottlenecks, collaborate with engineering teams for performance optimization, and plan for future growth. • Error Budgeting and Chaos Engineering : Diagnose and recommend optimization opportunities, conducts mock drills to improve stability and resiliency. • Documentation : Develop and maintain comprehensive documentation for system configurations, processes, and troubleshooting procedures to enhance knowledge sharing and team efficiency.

Minimum Qualifications - • Knowledgeable in cloud platforms such as AWS, Google Cloud, or Azure, and familiar with containerization technologies like Docker and Kubernetes. • Proficient in using infrastructure-as-code tools like Terraform and Ansible for automation and configuration management. Preferred Qualifications - • Experienced in software development with proficiency in programming languages like Python or Java. • Familiar with monitoring and logging tools such as Splunk Cloud, ThousandEyes . • Understands networking principles and protocols. • Capable of working collaboratively in a fast-paced, dynamic environment with excellent problem-solving skills.

Salary Range- $120,000-$130,000 a year

#LI-NR3