Tata Consultancy Services
Cloud Platform Engineer
Roles & Responsibilities
Platform Engineering • Application Infrastructure Provisioning & support • Building OS image (Golden Image) in cloud infrastructure • Ensuring reliability, stability & recoverability of overall IT Infrastructure • Capacity planning to optimize server & application performance • Management of Server space and remotely managed shared storage • Hardware & Software updates, patching & OS upgrade • Validation & execution of Firewall policies, troubleshooting error • Backup, restoration along with automated reporting task • Working on IaaC for server provisioning via Terraform & GitHub • Monitoring and Alerting: • Continuously monitor the performance, availability, and health of cloud resources (virtual machines, containers, databases, Load Balancers, networks, applications). • Set up and manage comprehensive monitoring tools (e.g., CloudWatch, Azure Monitor) or onboard cloud components to Dynatrace or Enterprise Monitoring Tool. • Define and implement alerting mechanisms to proactively identify and notify relevant teams of issues or anomalies. • Analyze logs, metrics, and traces to gain insights into system behavior. • Incident Management and Troubleshooting: • Act as the first line of defense for cloud-related incidents, quickly identifying, troubleshooting, and resolving issues. • Participate in on-call rotations to ensure 24/7 coverage for critical systems. • Collaborate with development, security, and other IT teams to diagnose and resolve complex problems. • Document incident resolutions and contribute to post-incident analysis (PIRs/RCAs) to prevent recurrence. • Infrastructure Management: • Manage the day-to-day operations of cloud infrastructure, including provisioning, deprovisioning, and scaling of resources. • Perform regular maintenance tasks such as patching, updates, and backups. • Implement and manage cloud configuration through Terraform to maintain consistency across environments. • Optimize resource utilization and performance. • Automation and Scripting: • Develop and implement automation scripts (e.g., Python, PowerShell, Bash) to streamline repetitive operational tasks. • Automa te provisioning, deployment, and scaling of cloud resources using Infrastructure as Code (IaC) tools like Terraform. • Integrate automation into CI/CD pipelines for faster and more reliable deployments. • Security and Compliance: • Implement and enforce security best practices within the cloud environment (e.g., IAM, network security groups, encryption). • Monitor for security vulnerabilities and threats and respond to security incidents. • Cost Optimization and Financial Management (FinOps): • Monitor and analyze cloud spending to identify cost-saving opportunities. • Implement strategies for cost optimization, such as rightsizing resources, utilizing reserved instances, and identifying idle or underutilized resources. • Generate cost reports and provide recommendations to management for budget optimization. • Cloud Governance: • Define and enforce cloud governance policies, standards, and procedures for resource usage, security, and cost. • Develop and maintain documentation for cloud operations processes, runbooks, and best practices. • Collaboration and Communication: • Work closely with development teams (DevOps), cloud architects, security teams, and other stakeholders to ensure alignment and effective operations. • Communicate clearly and concisely with both technical and non-technical audiences regarding system status, incidents, and planned changes. • Participate in cross-functional meetings and discussions to provide operational insights. • Capacity Planning: • Monitor resource utilization trends and forecast future capacity needs to ensure scalability and avoid performance bottlenecks. • Plan for scaling up or down based on anticipated demand or business requirements. • Continuous Improvement: • Proactively identify areas for improvement in cloud operations processes, tools, and infrastructure. • Implement automation and process enhancements to increase efficiency and reduce manual effort. • Stay up-to-date with the latest cloud technologies and best practices. • Vendor Management: • Work with cloud service providers (AWS, Azure, GCP) to resolve complex issues, stay informed about new services, and optimize service utilization.
Salary Range- $120,000-$130,000 a year
#LI-OJ1 #LI-DR1
Roles & Responsibilities
Platform Engineering • Application Infrastructure Provisioning & support • Building OS image (Golden Image) in cloud infrastructure • Ensuring reliability, stability & recoverability of overall IT Infrastructure • Capacity planning to optimize server & application performance • Management of Server space and remotely managed shared storage • Hardware & Software updates, patching & OS upgrade • Validation & execution of Firewall policies, troubleshooting error • Backup, restoration along with automated reporting task • Working on IaaC for server provisioning via Terraform & GitHub • Monitoring and Alerting: • Continuously monitor the performance, availability, and health of cloud resources (virtual machines, containers, databases, Load Balancers, networks, applications). • Set up and manage comprehensive monitoring tools (e.g., CloudWatch, Azure Monitor) or onboard cloud components to Dynatrace or Enterprise Monitoring Tool. • Define and implement alerting mechanisms to proactively identify and notify relevant teams of issues or anomalies. • Analyze logs, metrics, and traces to gain insights into system behavior. • Incident Management and Troubleshooting: • Act as the first line of defense for cloud-related incidents, quickly identifying, troubleshooting, and resolving issues. • Participate in on-call rotations to ensure 24/7 coverage for critical systems. • Collaborate with development, security, and other IT teams to diagnose and resolve complex problems. • Document incident resolutions and contribute to post-incident analysis (PIRs/RCAs) to prevent recurrence. • Infrastructure Management: • Manage the day-to-day operations of cloud infrastructure, including provisioning, deprovisioning, and scaling of resources. • Perform regular maintenance tasks such as patching, updates, and backups. • Implement and manage cloud configuration through Terraform to maintain consistency across environments. • Optimize resource utilization and performance. • Automation and Scripting: • Develop and implement automation scripts (e.g., Python, PowerShell, Bash) to streamline repetitive operational tasks. • Automa te provisioning, deployment, and scaling of cloud resources using Infrastructure as Code (IaC) tools like Terraform. • Integrate automation into CI/CD pipelines for faster and more reliable deployments. • Security and Compliance: • Implement and enforce security best practices within the cloud environment (e.g., IAM, network security groups, encryption). • Monitor for security vulnerabilities and threats and respond to security incidents. • Cost Optimization and Financial Management (FinOps): • Monitor and analyze cloud spending to identify cost-saving opportunities. • Implement strategies for cost optimization, such as rightsizing resources, utilizing reserved instances, and identifying idle or underutilized resources. • Generate cost reports and provide recommendations to management for budget optimization. • Cloud Governance: • Define and enforce cloud governance policies, standards, and procedures for resource usage, security, and cost. • Develop and maintain documentation for cloud operations processes, runbooks, and best practices. • Collaboration and Communication: • Work closely with development teams (DevOps), cloud architects, security teams, and other stakeholders to ensure alignment and effective operations. • Communicate clearly and concisely with both technical and non-technical audiences regarding system status, incidents, and planned changes. • Participate in cross-functional meetings and discussions to provide operational insights. • Capacity Planning: • Monitor resource utilization trends and forecast future capacity needs to ensure scalability and avoid performance bottlenecks. • Plan for scaling up or down based on anticipated demand or business requirements. • Continuous Improvement: • Proactively identify areas for improvement in cloud operations processes, tools, and infrastructure. • Implement automation and process enhancements to increase efficiency and reduce manual effort. • Stay up-to-date with the latest cloud technologies and best practices. • Vendor Management: • Work with cloud service providers (AWS, Azure, GCP) to resolve complex issues, stay informed about new services, and optimize service utilization.
Salary Range- $120,000-$130,000 a year
#LI-OJ1 #LI-DR1