Logo
Global Payments Inc.

Site Reliability Engineer

Global Payments Inc., Atlanta, Georgia, United States, 30383

Save Job

Site Reliability Engineer We are looking for a detail-oriented and technically strong Production Support Engineer to join our API Operations team. In this critical role, you will be responsible for monitoring, diagnosing, and resolving production incidents across our Apigee API Implementations. You’ll work closely with API engineering, Developer Services, Product Management, platform, and governance teams to ensure the stability, reliability, and performance of deployed models and agentic solutions across the enterprise. You will join a dynamic team passionate about learning, applying cutting-edge and cost effective technologies, and innovating to deliver high-quality, and highly available API solutions.

Responsibilities

Serve as the first line of defense for production incidents, ensuring rapid triage, root cause analysis, and resolution.

Monitor system health and performance of deployed APIs and integrating applications.

Track and investigate issues related to latency, failures, or broken integrations, escalating to the API engineering group where appropriate.

Collaborate with platform engineers to implement observability, logging, and alerting best practices for API services.

Build diagnostic tools, runbooks, and automated workflows to improve incident response time and reduce manual intervention.

Maintain knowledge bases and playbooks for repeatable troubleshooting and knowledge transfer.

Partner with governance and compliance teams to ensure incidents are documented and remediated in line with internal policy.

Contribute to retrospectives and continuous improvement efforts to harden production systems.

Must Haves

3+ years of experience in production support, site reliability engineering (SRE), or DevOps—preferably supporting Apigee APIs.

Strong understanding of cloud infrastructure (AWS, GCP) and observability tools.

Proficiency in Python or shell scripting for automation and troubleshooting.

Strong analytical, communication, and incident management skills.

Proficiency in programming languages such as Python and JavaScript.

Excellent problem-solving and analytical skills.

Excellent communication and collaboration skills.

Bonus Attributes

Bachelor’s degree in Computer Science, Engineering, or a related field.

Familiarity with big data technologies (Apache Spark, Kafka).

Experience with CI/CD tools and Alerts/Monitoring automation.

Familiarity with API Integrations.

Abilities

Ability to work proactively with a high level of initiative and accuracy.

Ability to manage multiple assignments effectively and meet established deadlines.

Strong interpersonal skills to interact professionally with staff and stakeholders.

Excellent organizational skills and attention to detail.

Critical thinking ability ranging from moderately to highly complex tasks.

Flexibility in adapting to changing business needs and priorities.

Ability to work creatively and independently with minimal supervision.

Ability to utilize experience and judgment in accomplishing goals.

Experience in navigating organizational structures and collaborating across teams.

Travel Required 2%

Physical Demands

Standing/ Walking – minimal level

Sitting – moderate to high level

Lifting – up to 15 lbs.

Visual Concentration – high level

Work Environment – typical office environment.

#J-18808-Ljbffr