Tata Consultancy Services
Java Production Support Engineer
Tata Consultancy Services, Addison, Texas, United States, 75001
Overview
Production Support Engineer role requiring extensive experience in production support, incident management, and monitoring of Java-based applications with high volume transactions. Role involves triage, incident escalation, root cause analysis, and collaboration with cross-functional teams to maintain production stability.
Responsibilities
Leads production support triage efforts, manages bridge line troubleshooting, engages in technical research, and escalates issues to leadership as needed.
Ensures all impacts are accurately recorded and documented in the system of record; oversees that documents and wikis are updated and available for use during triage; supports documentation of application flows, upstream/downstream impacts during outages, customer experience, and contacts for support needs.
Identifies and/or validates business impacts through interpretation of monitors, dashboards, and logs to communicate with leadership and vendors.
Manages activities to identify incident root cause, resolution, preventative actions, and change requests; reports on incident data quality.
Promotes and enforces production governance during triage/testing; identifies production failure scenarios, vulnerabilities, and opportunities for improvement.
Serves as a subject matter expert for applications within a portfolio; leverages extensive knowledge of application functionalities and application flows.
Assesses and prioritizes research requests, ad hoc reports, and offline incidents; delegates work as needed to team members and peers.
Runs start-of-day application health checks.
Performs traffic routing, takes servers out of tier, adds servers into tier, takes Java core dumps for root cause analysis, recycles JVMs, warms up JVMs, and adds servers into tier during production issues on an as-needed basis.
Reviews/updates monitoring requirements documents (MRDs) for proper monitoring; provides application release/change support; reviews upcoming changes and change runbooks; executes approved changes in production without errors.
Supports ARC/DR/data center isolation exercises; identify opportunities for monitoring and automation; develop tools, dashboards, reports, and alerts using Splunk and Dynatrace to aid monitoring and day-to-day tasks.
Identifies stability and risk items in production; collaborates with various teams to remediate and ensure production environment is stable, available, and resilient.
Qualifications
Bachelor's degree in Computer Science, Information Technology, or related field
Proven experience in production support or a related role
Experience in supporting Java/Java web services based applications with high volume transactions
Working knowledge of Splunk and Dynatrace tools to identify issues in production quickly
Basic knowledge of using Unix/Linux commands to login into servers, fetch logs, copy/delete files, run shell scripts
Strong analytical and problem-solving skills
Familiarity with incident and problem management processes
Excellent communication skills and ability to understand customer-based requirements and expectations; strong documentation skills; highly effective at driving process improvement based on lessons learned analysis
Ability to work effectively in a team environment and independently
Willingness to work in shifts and provide weekend support on a rotational basis
Strong MuleSoft experience (4.x and up); Web Services (SOAP, REST, etc.); Windows Services experience; strong RDBMS experience (SQL Server, Oracle, or DB2, etc.)
Hands-on or strong knowledge with Autosys job scheduling
Experience with Remedy, ServiceNow, JIRA in creating/updating/closing incident tickets
Experience with testing tools such as SoapUI/Postman for SOAP/REST APIs
Employment details
Seniority level: Mid-Senior level
Employment type: Full-time
Job function: Information Technology
Industries: IT Services and IT Consulting
Must have excellent oral and written communication skills. Work under minimal supervision and independently. Willing to work in shifts and on weekends on a rotation basis.
Salary: 100,000-120,000 per annum
#J-18808-Ljbffr
Responsibilities
Leads production support triage efforts, manages bridge line troubleshooting, engages in technical research, and escalates issues to leadership as needed.
Ensures all impacts are accurately recorded and documented in the system of record; oversees that documents and wikis are updated and available for use during triage; supports documentation of application flows, upstream/downstream impacts during outages, customer experience, and contacts for support needs.
Identifies and/or validates business impacts through interpretation of monitors, dashboards, and logs to communicate with leadership and vendors.
Manages activities to identify incident root cause, resolution, preventative actions, and change requests; reports on incident data quality.
Promotes and enforces production governance during triage/testing; identifies production failure scenarios, vulnerabilities, and opportunities for improvement.
Serves as a subject matter expert for applications within a portfolio; leverages extensive knowledge of application functionalities and application flows.
Assesses and prioritizes research requests, ad hoc reports, and offline incidents; delegates work as needed to team members and peers.
Runs start-of-day application health checks.
Performs traffic routing, takes servers out of tier, adds servers into tier, takes Java core dumps for root cause analysis, recycles JVMs, warms up JVMs, and adds servers into tier during production issues on an as-needed basis.
Reviews/updates monitoring requirements documents (MRDs) for proper monitoring; provides application release/change support; reviews upcoming changes and change runbooks; executes approved changes in production without errors.
Supports ARC/DR/data center isolation exercises; identify opportunities for monitoring and automation; develop tools, dashboards, reports, and alerts using Splunk and Dynatrace to aid monitoring and day-to-day tasks.
Identifies stability and risk items in production; collaborates with various teams to remediate and ensure production environment is stable, available, and resilient.
Qualifications
Bachelor's degree in Computer Science, Information Technology, or related field
Proven experience in production support or a related role
Experience in supporting Java/Java web services based applications with high volume transactions
Working knowledge of Splunk and Dynatrace tools to identify issues in production quickly
Basic knowledge of using Unix/Linux commands to login into servers, fetch logs, copy/delete files, run shell scripts
Strong analytical and problem-solving skills
Familiarity with incident and problem management processes
Excellent communication skills and ability to understand customer-based requirements and expectations; strong documentation skills; highly effective at driving process improvement based on lessons learned analysis
Ability to work effectively in a team environment and independently
Willingness to work in shifts and provide weekend support on a rotational basis
Strong MuleSoft experience (4.x and up); Web Services (SOAP, REST, etc.); Windows Services experience; strong RDBMS experience (SQL Server, Oracle, or DB2, etc.)
Hands-on or strong knowledge with Autosys job scheduling
Experience with Remedy, ServiceNow, JIRA in creating/updating/closing incident tickets
Experience with testing tools such as SoapUI/Postman for SOAP/REST APIs
Employment details
Seniority level: Mid-Senior level
Employment type: Full-time
Job function: Information Technology
Industries: IT Services and IT Consulting
Must have excellent oral and written communication skills. Work under minimal supervision and independently. Willing to work in shifts and on weekends on a rotation basis.
Salary: 100,000-120,000 per annum
#J-18808-Ljbffr