Exadel Inc
Senior Application Reliability Engineer (Java)
Bulgaria, Poland
Why Join Exadel We’re an AI-first global tech company with 25+ years of engineering leadership, 2,000+ team members, and 500+ active projects powering Fortune 500 clients, including HBO, Microsoft, Google, and Starbucks.
From AI platforms to digital transformation, we partner with enterprise leaders to build what’s next.
What powers it all? Our people are ambitious, collaborative, and constantly evolving.
What You’ll Do Reliability Improvements Within Java Applications
Review current Java services and identify reliability gaps
Introduce patterns such as rate limiting, backpressure, traffic shedding, and circuit breakers
Support uplift plans that raise applications to an agreed level of resilience
Guide development teams toward practical changes that improve stability and consistent delivery
Production Experience With Java Systems
Troubleshooting and root cause analysis involving memory issues, thread behavior, and runtime failures
Load testing or stress testing experience using tools such as JMeter
OpenTelemetry Instrumentation in Java Code
Add metrics, logs, and traces using OpenTelemetry libraries
Ensure the Java service exposes meaningful telemetry
Use the client’s existing platform for ingestion and dashboards
Collaboration and Client Interaction
Able to explain reliability concepts in a clear and simple way to product owners and senior stakeholders
Lead conversations about SLOs, SLIs, service availability, and service behavior
What You Bring
Demonstrated expertise in SLOs, SLIs, error budgets linked to release velocity and operational decisions
Experience creating or improving Service Level Contracts for distributed systems
Ability to apply failure-mode analysis, chaos engineering, and resilience patterns to assess readiness
Deep understanding of traffic management: rate limiting, backpressure, load shedding, circuit breakers
Experience designing or evaluating progressive delivery (canary, feature flags, blue/green) with reliability focus
Strong familiarity with container orchestration (K8s, ECS) from a production reliability perspective
Ability to assess CI/CD maturity, test coverage, pipeline reliability, and deployment safety
Conducting reliability audits, maturity reviews, gap analysis, and producing improvement roadmaps
Hands‑on experience with observability: distributed tracing, structured logging, actionable metrics (RED/USE)
Designing alerting strategies that reduce noise and improve actionability
Participation in incident response, on‑call rotations, blameless post‑mortems, and driving systemic improvements
Ability to communicate reliability trade‑offs and influence engineering, product teams, and leadership
Intermediate+
Legal & Hiring Information
Exadel is proud to be an Equal Opportunity Employer committed to inclusion across minority, gender identity, sexual orientation, disability, age, and more
Reasonable accommodations are available to enable individuals with disabilities to perform essential functions
Please note: this job description is not exhaustive. Duties and responsibilities may evolve based on business needs
Your Benefits at Exadel Exadel benefits vary by location and contract type. Your recruiter will fill you in on the details.
International projects
In‑office, hybrid, or remote flexibility
Medical healthcare
Recognition program
Ongoing learning & reimbursement
Team events & local benefits
Sports compensation
We lead with trust, respect, and purpose. We believe in open dialogue, creative freedom, and mentorship that helps you grow, lead, and make a real difference. Ours is a culture where ideas are challenged, voices are heard, and your impact matters.
#J-18808-Ljbffr
Why Join Exadel We’re an AI-first global tech company with 25+ years of engineering leadership, 2,000+ team members, and 500+ active projects powering Fortune 500 clients, including HBO, Microsoft, Google, and Starbucks.
From AI platforms to digital transformation, we partner with enterprise leaders to build what’s next.
What powers it all? Our people are ambitious, collaborative, and constantly evolving.
What You’ll Do Reliability Improvements Within Java Applications
Review current Java services and identify reliability gaps
Introduce patterns such as rate limiting, backpressure, traffic shedding, and circuit breakers
Support uplift plans that raise applications to an agreed level of resilience
Guide development teams toward practical changes that improve stability and consistent delivery
Production Experience With Java Systems
Troubleshooting and root cause analysis involving memory issues, thread behavior, and runtime failures
Load testing or stress testing experience using tools such as JMeter
OpenTelemetry Instrumentation in Java Code
Add metrics, logs, and traces using OpenTelemetry libraries
Ensure the Java service exposes meaningful telemetry
Use the client’s existing platform for ingestion and dashboards
Collaboration and Client Interaction
Able to explain reliability concepts in a clear and simple way to product owners and senior stakeholders
Lead conversations about SLOs, SLIs, service availability, and service behavior
What You Bring
Demonstrated expertise in SLOs, SLIs, error budgets linked to release velocity and operational decisions
Experience creating or improving Service Level Contracts for distributed systems
Ability to apply failure-mode analysis, chaos engineering, and resilience patterns to assess readiness
Deep understanding of traffic management: rate limiting, backpressure, load shedding, circuit breakers
Experience designing or evaluating progressive delivery (canary, feature flags, blue/green) with reliability focus
Strong familiarity with container orchestration (K8s, ECS) from a production reliability perspective
Ability to assess CI/CD maturity, test coverage, pipeline reliability, and deployment safety
Conducting reliability audits, maturity reviews, gap analysis, and producing improvement roadmaps
Hands‑on experience with observability: distributed tracing, structured logging, actionable metrics (RED/USE)
Designing alerting strategies that reduce noise and improve actionability
Participation in incident response, on‑call rotations, blameless post‑mortems, and driving systemic improvements
Ability to communicate reliability trade‑offs and influence engineering, product teams, and leadership
Intermediate+
Legal & Hiring Information
Exadel is proud to be an Equal Opportunity Employer committed to inclusion across minority, gender identity, sexual orientation, disability, age, and more
Reasonable accommodations are available to enable individuals with disabilities to perform essential functions
Please note: this job description is not exhaustive. Duties and responsibilities may evolve based on business needs
Your Benefits at Exadel Exadel benefits vary by location and contract type. Your recruiter will fill you in on the details.
International projects
In‑office, hybrid, or remote flexibility
Medical healthcare
Recognition program
Ongoing learning & reimbursement
Team events & local benefits
Sports compensation
We lead with trust, respect, and purpose. We believe in open dialogue, creative freedom, and mentorship that helps you grow, lead, and make a real difference. Ours is a culture where ideas are challenged, voices are heard, and your impact matters.
#J-18808-Ljbffr