Quick Read is a series of blogs aiming to provide a quick explanation for one concept. One at A Time.
What is Circuit Breaker?
Circuit breaker is a design pattern used in modern software development. It is used to detect failures and encapsulates the logic of preventing a failure from constantly recurring, during maintenance, temporary external system failure or unexpected system difficulties. The Circuit Breaker pattern also enables an application to detect whether the fault has been resolved.
How does CB improves service availability?
Almost every software system has some dependencies on the remote system, e.g. a DB instance, when the DB slows down significantly because of heavy load or not available, the request from the clients might have to wait long time until the request times out. These requests will hold the some of the critical resource of the system such as connection pool, memory, CPU. New requests for the same resource access will keep come to the system and continue to hold these resources even after the previous requests have timed out.
And requests that doesn’t need to access these remote resources will be blocked until the critical resources becomes available: the failure in one module failed more modules. With CB on the remote resource access path, the later requests would quickly fail by the CB and the resources will be quickly released. Hence, other part of the system is still available.
In sum, CB improves the overall service availability by breaking the access to the operations that might fail to prevent a bigger failure happen to the system.
How does Circuit Breaker work?
CB usually work as a proxy on the path access some dependent resource. You can wrap a protected function call in a circuit breaker object, which monitors for failures. Once the failures reach a certain threshold, the circuit breaker trips, and all further calls to the circuit breaker return with an error, without the protected call being made at all.
What’s the difference between CB and retry?
Many failures are transient, say a network packet loss or issue. These failure can be resolved by retry. However, there are failures due to unanticipated events, and that might take much longer to fix. In these situations it might be pointless for an application to continually retry an operation that is unlikely to succeed, and instead the application should quickly accept that the operation has failed and handle this failure accordingly.
The retry logic should be sensitive to any exceptions returned by the circuit breaker and abandon retry attempts if the circuit breaker indicates that a fault is not transient.
Is TIMEOUT enough to protect the system?
Timeout is a common strategy in system integration, when the access or task failed to finish within given time, the service is timed out in order to prevent infinite waiting. And timeout is often handled (sometime with retry). However, timeout on the client side usually takes too long be detected, while all the critical resource is already used up int these period of time, and caused cascaded failure.
For example: an operation that invokes a service could be configured to implement a timeout, and reply with a failure message if the service fails to respond within this period. This strategy could cause many concurrent requests to the same operation to be blocked until the timeout period expires. These blocked requests might hold critical system resources such as memory, threads, database connections, and so on.
It would be preferable for the operation to fail immediately instead of wait until timeout.
Does CB needs manual reset after break happens?
we can have the breaker itself detect if the underlying calls are working again. We can implement this self-resetting behavior by trying the protected call again after a suitable interval, and resetting the breaker should it succeed.
A circuit breaker component might have the following three status:
In the half-open status, some requests could be allowed to send through, if successful, the system transition to open status, otherwise back to closed status.
How does CB pattern work for asynchronous calls?
Asynchronous calls or tasks usually submit to the queue/worker system for execution. In some cases, the callers are less sensitive for the call execution or result. In other times, a timeout can be set up for such tasks incase they take too long.
How to install CB in the async task execution? A common technique is to put all requests on a queue, which the supplier consumes at its speed – a useful technique to avoid overloading servers. In this case the circuit breaks when the queue fills up.