Circuit Breakers in Microservices | Hacker Noon

Author profile picture

The circuit breaker is a design pattern, used extensively in distributed systems to prevent cascading failures. In this post, we’ll go through the problem of cascading failures and go over how the circuit breaker pattern is used.

Motivation: The problem of cascading failures

Before jumping into the circuit breaker pattern, let’s try and understand what problem it tries to solve.
When service A tries to communicate with service B it allocates a thread to make that call. There are 2 kinds of failures that can occur while making the call. We use the example of a user service making a call to friends service.

''' user service ''' 
def get_user_info(user_id: str):  
    try:    
        friends_service.get_friends(user_id)  
    except Exception as e:    
        raise InternalServerError

Immediate Failures: In immediate failure, an exception is raised immediately (like: Connection Refused) and the service A thread is freed.

Timeout Failures: If service_b takes a long time to respond, as we get new requests to service A, we’re getting more and more threads waiting for service_b. If several requests are made while waiting for timeouts this can exhaust service A’s thread-pool and can bring down service A.

“Your code can’t just wait forever for a response that might never come, sooner or later, it needs to give up. Hope is not a design method.” -Michael T. Nygard, Release It!

Let’s walk through an example of a social media application to understand this better. Here we have an

aggregator

service which is what the client interacts with, it aggregates results from a bunch of services including the

user

service. User service calls photo service and friends service which in turn calls

friends_db

.

Here, friends service tries to make requests to the

friends_db

, however, friends_db is not responding with an immediate failure, instead keeps the threads from the

friends

service waiting. The

friends

service tries to retry thereby using more threads. As it gets new requests more threads are waiting on the

friends_db

to respond.

We can now see how

friends

service is now becoming the source of timeouts for user service. User service exhausts it’s thread-pool waiting for requests from friends service, just how friends service was waiting for

friends_db

. We can now see how failure in friends_db caused a cascading failure in services indirectly dependent on it.

Eventually, the aggregator service will also come down with the same reason. The client calls the aggregator service and so our system is effectively shut down for the users. We see how one error in one component of our architecture caused a cascading failure bringing all other services down.

Circuit Breaker Pattern

The circuit breaker is usually implemented as an interceptor pattern/chain of responsibility/filter. It consists of 3 states:

  • Closed: All requests are allowed to pass to the upstream service and the interceptor passes on the response of the upstream service to the caller.
  • Open: No requests are allowed to pass to upstream and interceptor responses with a default response generally an error response.
  • Half-Open: Some of the requests are allowed to pass to upstream others are terminated and responded with a default response.

The following shows the circuit breaker interceptor in its 3 states

The circuit breaker is implemented as an interceptor intercepting all requests from user service to friends service. In this picture it is in the “closed” state and allows all requests to be passed to the friends service

The circuit-breaker switches to the “open” state when the number of failures to the friend service are more than the failure threshold. It doesn’t allow requests from the user service to reach friends service instead it responds immediately with a default response

After a set “recovery timeout” period has passed the circuit breaker switches to a “half-open” state where it allows some of the requests to reach the friends service and the others are terminated and responded with the default response.

Let’s look at a Python example for a circuit breaker. You can create your own circuit breaker using:

from circuitbreaker import CircuitBreaker

class MyCircuitBreaker(CircuitBreaker):
    FAILURE_THRESHOLD = 20
    RECOVERY_TIMEOUT = 60
    EXPECTED_EXCEPTION = RequestException

@MyCircuitBreaker()
def get_user_info(user_id):
  try:
    friends_service.get_friends(user_id)
  except Exception as e:
    raise InternalServerError

We can also leverage the sidecar pattern to this. In this approach, we don’t have to modify our services by wrapping them around circuit-breakers, but instead, we ship our applications with a sidecar like Envoy. All outbound traffic from the service is proxies through Envoy. Envoy supports the circuit breaking out of the box. Following is an example configuration of circuit-breaking with Envoy:

circuit_breakers:
  thresholds:
    - priority: DEFAULT
      max_connections: 1000
      max_requests: 1000
    - priority: HIGH
      max_connections: 2000
      max_requests: 2000

Resources

Also published here.

Tags

The Noonification banner

Subscribe to get your daily round-up of top tech stories!

read original article here