RLDatix

Senior Resilience Tester Job at RLDatix in Myrtle Point

RLDatix, Myrtle Point, OR, United States, 97458

Senior Quality Engineer – Platform Resilience & Scalability | Platform Engineering | UK - Remote

RLDatix(RLD) is on a mission to help raise the standard of care…everywhere. Trusted by over 10,000 healthcareorganisationsaround the world, our solutions help improve health and care. Our applications ensure that patients receive the best and safest care while supporting the providers who deliver it.

JoiningTeamRLDmeans being part of a global effort of over 2,000 team members in making a difference in healthcare…every day.

We’researching for a UK-based Quality Engineer – Platform Resilience & Scalabilityto join ourPlatform Engineeringteam, so that we can ensure our Internal Developer Platformremainsresilient, scalable, andhighly availableacross multiple global regions. The Quality Engineer will design and execute resilience and performance testing strategies to guarantee our platform meets a 99.95% uptime SLA and scales dynamically under demanding conditions.

HowYou’llSpend Your Time

Designchaos experiments using tools like Chaos Mesh, Litmus, or AWS Fault Injection Simulator tovalidatefailure scenarios across EKS clusters and regions.
Testauto-recovery mechanisms such asKarpenterautoscaling, pod restarts, and ALB failover in order to ensure platform resilience.
Analyseperformance bottlenecks in Kubernetes clusters, Istio service mesh latency, andGitOpspipeline throughput tooptimisesystembehaviour.
Validatescalability by testing rapid scale-up scenarios and multi-region failover capabilities to support 3,000+ pods per cluster.
Define and monitorSLOs/SLIs for platform services usingHoneyComb, CloudWatch, and Prometheus tomaintainobservability and reliability.

What Kind of ThingsWe’reMost Interested in You Having

Strong experience in Kubernetes production environments (EKS preferred).
Proven success in chaos engineering and resilience testing using major frameworks.
In-depth knowledge of distributed systems failure modes and performance tuning.
Sincere interest in building resilient, scalable platforms that power global healthcare solutions.
A knack for working collaboratively within a fast-paced, cloud-native environment.

#J-18808-Ljbffr