Logo
JPMorganChase

Lead Site Reliability Engineer

JPMorganChase, Plano, Texas, us, 75086

Save Job

Join to apply for the

Lead Site Reliability Engineer

role at

JPMorgan Chase . Assume a critical role in defining the future of a globally recognized firm and have a direct and significant impact in a realm tailored for top achievers in site reliability. Job Description As a Lead Site Reliability Engineer at JPMorgan Chase within the Corporate Technology division of the Global Finance Tech team, you will hold a leadership role, demonstrating strong knowledge across multiple technical domains and advising others on technical and business issues. You will lead resiliency design reviews, break down complex problems, act as a technical lead for medium to large products, and mentor other engineers. Leverage AI tools to enhance operational effectiveness and automate processes, ensuring high-quality customer service. Spearhead projects to enhance the reliability and stability of applications and platforms. Utilize data-driven analytics and AI technologies to automate detection, diagnosis, and resolution processes, elevating service levels and promoting continuous improvement. Engage stakeholders to establish realistic service level objectives and error budgets aligned with customer expectations. Exhibit advanced technical proficiency in domains such as observability stacks (e.g., Prometheus, Grafana), and proactively address technology bottlenecks. Apply AI-promoted solutions to streamline processes and improve operational efficiency. Serve as the primary contact during major incidents, demonstrating swift issue resolution to prevent financial losses. Document and share knowledge through internal forums and communities of practice. Mentor team members, guiding them in adopting AI technologies to improve operational effectiveness and customer service. Required Qualifications, Capabilities, and Skills Formal training or certification in site reliability engineering and 5+ years of applied experience. Proven success in an SRE or senior DevOps role, with expertise in SLIs/SLOs, incident management, postmortem analysis, and systems reliability. Expertise with observability tools (e.g., Prometheus, Grafana, Splunk, OpenTelemetry). Hands-on coding skills in at least one programming language, experience with cloud platforms (AWS or GCP), Kubernetes, Terraform, and resilient CI/CD pipelines. Interest or experience in applying AI to operations, such as LLM-based copilots, anomaly detection, automated runbooks, autonomous agents, or RAG workflows. Ability to perform under pressure, adapt to uncertainty, and thrive in high-accountability environments. Strong organization skills, clarity in documentation and design, and effective communication, especially during incidents. Preferred Qualifications, Capabilities, and Skills Experience with operational and compliance standards in banking or fintech sectors. Practical knowledge of LLM frameworks, AI orchestration tools, vector databases, or custom reliability agents. Experience with game days, chaos engineering, or failure-mode analysis. Background in mentoring and leading knowledge-sharing around AI and SRE best practices. About Us JPMorgan Chase is a leading financial services firm helping millions of households and small businesses achieve their financial goals. We focus on creating engaged, lifelong relationships and providing comprehensive financial solutions. Our benefits include competitive salaries, health coverage, wellness centers, retirement plans, and more. We value diversity and are an equal opportunity employer committed to inclusion and reasonable accommodations for all applicants. About The Team Our Consumer & Community Banking division serves customers through personal banking, credit cards, mortgages, and more, leading in credit card sales and digital solutions.

#J-18808-Ljbffr