Operational resilience, one removed safeguard at a time.
DORA's Art. 11 (response and recovery) is about keeping critical functions running through a disruption and recovering from one: business-continuity and response measures for ICT-related incidents. For a financial entity a lot of that resilience is built in code, the retries, circuit breakers, failovers, and timeouts that stop one component's failure from taking down a critical service. Each of those is a pull-request decision.
The shapes the same control failure takes.
Resilience is rarely lost in one dramatic change. It erodes when a safeguard that contained failure is removed because it seemed redundant. The recurring shapes:
A circuit breaker is removed
A breaker that isolated a failing dependency is dropped, so a downstream failure now cascades into the critical service instead of being contained.
A retry or backoff is removed
Retry-with-backoff around a flaky call is removed, so a transient blip becomes a hard failure that reaches the user.
A failover or redundancy path is removed
A fallback to a secondary instance, region, or provider is dropped, leaving no path when the primary fails.
Graceful degradation is removed
A path that let the service degrade (serve cached or reduced functionality) is replaced by a hard failure of the whole function.
A timeout is removed
A timeout on a call to a dependency is removed, so a hung dependency can block threads and stall the whole service.
A circuit breaker removed from a critical call.
A payment-status service calls a downstream provider. A circuit breaker around that call kept the service responsive when the provider was slow. A refactor removes the breaker because it 'never trips'. Now, when the provider degrades, the calls pile up and the critical service stalls with it.
- const provider = withCircuitBreaker(rawProvider, { failureThreshold: 5 })+ const provider = rawProviderconst status = await provider.getStatus(paymentId)Removing the circuit breaker means a slow or failing provider can stall the critical payment-status function instead of being isolated. Art. 11 (response and recovery) expects measures that keep critical functions resilient through a disruption. Keep the breaker (and its fallback), and if it never trips, that is the breaker doing its job, not a reason to remove it.
Resilience is something you have to be able to demonstrate.
DORA expects a financial entity to be able to show its operational resilience: continuity measures, the ability to respond to and recover from ICT incidents, and testing of all of that. A change that quietly removed a circuit breaker, a failover, or a timeout thins out that resilience, and it is visible in the diff. Catching it in review keeps the resilience you can demonstrate matching the resilience you actually have.
A review, not a resilience programme.
heygrc flags changes that touch a DORA obligation and cites the article so the fix happens in the pull request. It does not run your ICT risk management framework or your resilience testing. It catches the moment a safeguard that kept a critical function resilient is removed, at the diff. heygrc is in early access.