Writing a circuit breaker in Go

Redowan Delowar October 6, 2024
Source

Besides retries, circuit breakers are probably one of the most commonly employed resilience patterns in distributed systems. While writing a retry routine is pretty simple, implementing a circuit breaker needs a little bit of work.

I realized that I usually just go for off-the-shelf libraries for circuit breaking and haven't written one from scratch before. So, this is an attempt to create a sloppy one in Go. I picked Go instead of Python because I didn't want to deal with sync-async idiosyncrasies or abstract things away under a soup of decorators.

Circuit breakers

A circuit breaker acts like an automatic switch that prevents your application from repeatedly trying to execute an operation that's likely to fail. In a distributed system, you don't want to bombard a remote service when it's already failing, and circuit breakers prevent that.

It has three states: Closed, Open, and Half-Open. Here's a diagram that shows the state transitions:

{{< mermaid >}} stateDiagram-v2 [] --> Closed: Start Closed --> Open: Failure threshold reached Open --> HalfOpen: Recovery period expired HalfOpen --> Closed: Success threshold reached HalfOpen --> Open: Request failed

note right of Closed: All requests are allowed
note right of Open: Requests are blocked
note right of HalfOpen: Limited requests allowed to check recovery

{{</ mermaid >}}

  1. Closed: This is the healthy operating state where all requests are allowed to pass through to the service. If a certain number of consecutive requests fail (reaching a failure threshold), the circuit breaker switches to the Open state.

  2. Open: In this state, all requests are immediately blocked, and an error is returned to the caller without attempting to contact the failing service. This prevents overwhelming the service and gives it time to recover. After a predefined recovery period, the circuit breaker transitions to the Half-Open state.

  3. Half-Open: The circuit breaker allows a limited number of test requests to see if the service has recovered. If these requests succeed, it transitions back to the Closed state. If any of them fail, it goes back to the Open state.

Building one in Go

Here's a simple circuit breaker in Go.

Defining states

First, we'll define the constants for our states and create the circuitBreaker struct, which holds all the configurable knobs.

This struct includes:

  • mu: A mutex to ensure thread-safe access to the circuit breaker.
  • state: The current state of the circuit breaker (Closed, Open, or HalfOpen).
  • failureCount: The current count of consecutive failures.
  • lastFailureTime: The timestamp of the last failure.
  • halfOpenSuccessCount: The number of successful requests in the HalfOpen state.
  • failureThreshold: The number of consecutive failures allowed before opening the circuit.
  • recoveryTime: The cool-down period before the circuit breaker transitions from Open to HalfOpen.
  • halfOpenMaxRequests: The maximum number of successful requests needed to close the circuit.
  • timeout: The maximum duration to wait for a request to complete.

Initializing the breaker

Next, we provide a constructor function to initialize a new circuitBreaker instance.

This function sets the initial state to Closed and initializes the thresholds and timeout.

Implementing the Call method

The Call method is the primary interface for executing functions through the circuit breaker. It dispatches the appropriate state handler based on the current state.

We use a mutex to protect against concurrent access since the circuit breaker might be used by multiple goroutines. The Call method uses a switch statement to delegate the function call to the appropriate handler based on the current state.

Handling closed states

In the Closed state, all requests are allowed to pass through. We monitor the requests for failures to decide when to trip the circuit breaker.

In this function:

  • We attempt to execute the provided function fn using runWithTimeout to handle possible timeouts.

  • If the function call fails, we increment the failureCount and update lastFailureTime.

  • If the failureCount reaches the failureThreshold, we transition the circuit to the Open state.

  • If the function call succeeds, we reset the circuit breaker to the Closed state by calling resetCircuit.

Resetting the breaker

When a request succeeds, we reset the failure count and keep the circuit in the Closed state.

Handling open states

In the Open state, all requests are blocked to prevent further strain on the failing service. We check if the recovery period has expired before transitioning to the HalfOpen state.

Here:

  • We check if the recovery period (recoveryTime) has passed since the last failure.
  • If it has, we transition to the HalfOpen state and reset the counters.
  • If not, we block the request and return an error immediately.

Handling half-open states

In the HalfOpen state, we allow a limited number of requests to test if the service has recovered.

In this function:

  • We attempt to execute the provided function fn.
  • If the function call fails, we transition back to the Open state.
  • If the function call succeeds, we increment halfOpenSuccessCount.
  • Once the success count reaches halfOpenMaxRequests, we reset the circuit breaker to the Closed state.

Running functions with timeout

To prevent the circuit breaker from hanging on slow or unresponsive functions, we implement a timeout mechanism. You probably noticed that inside each state handler we called the wrapped functions with runWithTimeout.

This function:

  • Creates a context with a timeout using context.WithTimeout.
  • Executes the provided function fn in a separate goroutine.
  • Waits for either the result or the timeout.
  • Returns an error if the function takes longer than the specified timeout.

Taking it for a spin

Let's test our circuit breaker with an unreliable service that sometimes fails.

In the main function, we'll create a circuit breaker and make several calls to the unreliable service.

This loop simulates multiple service calls, using the circuit breaker to handle failures and transitions between states.

This prints:

The log messages will give you a sense of what's happening when we retry an intermittently failing function wrapped in a circuit breaker.

The API could be better

One limitation of Go generics is that you can't use type parameters with methods that have a receiver. This means you can't define a method like func (cb CircuitBreaker[T]) Call(fn func() (T, error)) (T, error).

For this, we have to use workarounds such as using any (an alias for interface{}) as the return type in our function signatures. While this sacrifices some type safety, it allows us to create a flexible circuit breaker that can handle functions returning different types.

Handling incompatible function signatures

What if the function you want to wrap doesn't match the func() (any, error) signature? You can easily adapt it by wrapping your function to fit the required signature.

Suppose you have a function like this:

You can wrap it like this:

Now, wrappedFunc matches the func() (any, error) signature and can be used with our circuit breaker.

Here's the complete implementation on GitHub with tests.

Discussion in the ATmosphere

Loading comments...