Patterns: Retry vs Circuit Breaker

Published on Sunday, July 24, 2016

Today I want to tell you about two interesting patterns: Retry and Circuit Breaker. They look the same at first sight, but they are used for solving different problems.

Retry

Context and problem

Your application interacts with some service in the net. The application must handle possible errors. Typical errors: connection problems, temporary unavailability of a service or time-outs because of peak load on the service. Key factor is that errors are temporary and will be eliminated by themselves in some time.

Solution

If your application detects an error while interacting with a service, it can try to handle it using one of the following strategy:

  1. if the error is unusual, and probably will repeat (authorization error when you use wrong password will occur every time you try to interact), the application should stop an operation and tell you about the error
  2. if the error is temporary and unlikely it will repeat in future, the application can make another attempt to send request. Additionally, the application can make small delay between requests that will increase the chances of success

Example

Remote service:

public interface ITransactionService
{
    void SendMoney(int sum);
}

public class TransactionService : ITransactionService
{
    private readonly Random _random = new Random();
    public void SendMoney(int sum)
    {
        if (_random.Next(3) == 0)
            throw new HttpException("Network problems...");

        Console.WriteLine($"Money sent. Sum {sum}");
    }
}

Application:

class Program
{
    private static readonly int _retryCount = 3;
    static void Main(string[] args)
    {
        RetryPatternTest();
    }

    private static void RetryPatternTest()
    {
        var service = GetService();
        var currentRetry = 0;
        do
        {
            try
            {
                currentRetry++;
                service.SendMoney(100); //try to call remote service
                break;
            } 
            //if our exception is transient and we don't exceed retry attempts
            //we just log exception and try again
            catch (Exception ex) when (currentRetry <= _retryCount && IsTransient(ex))
            {
                Trace.WriteLine(ex);
            }
            //small delay between attempts
            Thread.Sleep(300);
        } while (true);

        Console.WriteLine("Operation complete");
        Console.WriteLine($"Attempts: {currentRetry}");
    }

    private static bool IsTransient(Exception ex)
    {
        //check if Exception is transient
        return ex is HttpException;
    }

    private static ITransactionService GetService()
    {
        return new TransactionService();
    }
}

The code calling remote service is placed in try-catch block inside a loop. The loop will be finished, when service.SendMoney method completes without any errors. If this method throws an exception, catch block checks the error is transient (temporary) and retry count is not exceeded, log this error and make another attempt to call the service method after a small delay. Method IsTransient checks an error, and may differ depending on environment and other conditions. Also, this pattern is often used for resolving optimistic-concurrency problem in Entity Framework:

var currentRetry = 0;
using (var context = new DbContext(ConnectionString))
{
    var user = context.Set<user>().First(o =>; o.Id == 1);
    user.Login = "new_user_login";
    do
    {
        try
        {
            currentRetry++;
            context.SaveChanges();
            break;
        }
        catch (DbUpdateConcurrencyException ex) when (currentRetry <= retryCount)
        {
            var entry = ex.Entries.Single();
            entry.OriginalValues.SetValues(entry.GetDatabaseValues());
        }
    } while (true);
}

When to use

Your application interacts with remote service and temporary errors can occur. These errors are short-time and high probability that the next request will be finished successfully.

When not to use

  1. Errors are long-time and the application will waste resources trying to repeat request
  2. For business-logic exception handling
  3. As alternative to scaling. If a service tells you "busy" very often, probably it needs more resources

Circuit Breaker

When Retry pattern is not suitable, there is another great one.

Context and problem

Unlike Retry pattern, Circuit Breaker designed for less excepted error that can last much longer: network interruption, denial of service or hardware. In these cases new request will fail with a high probability, and we will get the same error. For example, an application interacts with a service that has a time out. And if the application does not get response before time out, operation will fail. When this service has problems (high load), your application will waste time (and other critical resources) awaiting response from the service. While these resources are needed by other parts of the application. More preferably to complete request with an error immediately without waiting for a time out, and try to repeat operation when probability of success is high.

Solution

Circuit Breaker pattern prevents attempts of an application to perform an operation that is likely to fail, that allows your application to continue working without wasting critical resources while problem is not resolved. The pattern can also detect whether the problem is resolved, and allows the application to repeat operation. Circuit Breaker is like a proxy-service between an application and a remote service. The proxy monitors last errors and decide, whether it is possible to perform an operation or just immediately return an error. The proxy can be implemented as a state machine with states:

  1. Closed: request goes from an application to service directly. Proxy increments an error counter when it detects errors. If amount of errors for some period of time is greater than defined value, proxy moves to Open state and starts a timer. When the timer expires, proxy moves to Half-Open state. Purpose of timer - to give the service time to solve the problem before allowing the application to repeat request.
  2. Open: request completes with an error immediately
  3. Half-Open: a limited number of requests from the application are allowed to be sent to the service. If all requests are successful, then we assume that the previous error is resolved, and proxy moves to Closed state (error counter reset is to 0). If any request is fail, we assume that the error is present and proxy moves to Open state and restart the timer. Half-Open state helps to prevent fast growth of requests to the service, because after start working it can process a limited number of requests for some time before full restore.

Example

A remote service that can throw a permanent error (time out emulation):

public interface ITransactionService
{
    void SendMoney(int sum);
}
public class TransactionService : ITransactionService
{
    private readonly Random _random = new Random();
    private static int _counter = 0;
    public void SendMoney(int sum)
    {
        _counter++;
        Thread.Sleep(1000);
        if (_counter > 5 && _counter < 10)
        {
            Thread.Sleep(4000); //timeout exception
            throw new HttpException("Network problems...");
        }

        Console.WriteLine($"Money sent. Sum {sum}");
    }
}

Circuit Breaker:

public enum CircuitBreakerState
{
    Closed,
    Open,
    HalfOpen
}
public class CircuitBreaker
{
    private const int ErrorsLimit = 3; //errors count limit
    private readonly TimeSpan _openToHalfOpenWaitTime = TimeSpan.FromSeconds(10); //time to wait for half open state change
    private int _errorsCount; //current errors count
    private CircuitBreakerState _state = CircuitBreakerState.Closed;
    private Exception _lastException;
    private DateTime _lastStateChangedDateUtc;

    private void Reset()
    {
        _errorsCount = 0;
        _lastException = null;
        _state = CircuitBreakerState.Closed;
    }

    private bool IsClosed => _state == CircuitBreakerState.Closed;

    public void ExecuteAction(Action action)
    {
        //state == Closed
        if (IsClosed)
        {
            try
            {
                //pass action to service
                action();
            }
            catch (Exception ex)
            {
                //error occurred, increment error counter and set last error
                TrackException(ex);
                //pass exception to application
                throw;
            }
        }
        else
        {
            //check if proxy is Half-Open
            //or if state is Open and timer expired
            if (_state == CircuitBreakerState.HalfOpen || IsTimerExpired())
            {
                _state = CircuitBreakerState.HalfOpen;

                //try to execute action
                try
                {
                    action();
                }
                catch(Exception ex)
                {
                    Reopen(ex);
                    throw;
                }
                //reset proxy state, if no error occurred
                Reset();
                return;
            }
            //if state == Open, just pass last error to application
            throw _lastException;
        }
    }

    private void Reopen(Exception exception)
    {
        _state = CircuitBreakerState.Open;
        _lastStateChangedDateUtc = DateTime.UtcNow;
        _errorsCount = 0;
        _lastException = exception;
    }

    private bool IsTimerExpired()
    {
        return _lastStateChangedDateUtc + _openToHalfOpenWaitTime < DateTime.UtcNow;
    }

    private void TrackException(Exception exception)
    {
        _errorsCount++;
        if (_errorsCount >= ErrorsLimit)
        {
            _lastException = exception;
            _state = CircuitBreakerState.Open;
            _lastStateChangedDateUtc = DateTime.UtcNow;
        }
    }
}

Proxy:

public class TransactionServiceProxy : ITransactionService
    {
        private readonly ITransactionService _service = new TransactionService();
        private readonly CircuitBreaker _circuitBreaker = new CircuitBreaker();
        public void SendMoney(int sum)
        {
            _circuitBreaker.ExecuteAction(() => _service.SendMoney(sum));
        }
    }

Application:

class Program
{
    static void Main(string[] args)
    {
        var service = GetService();
        for (int i = 0; i < 100; i++)
        {
            var sw = Stopwatch.StartNew();
            try
            {
                service.SendMoney(100);
            }
            catch (Exception ex)
            {
                Console.WriteLine("Error occurred. Wait fo 500 milliseconds");
                Thread.Sleep(500);
            }
            finally
            {
                sw.Stop();
                Console.WriteLine($"Elapsed time: {sw.ElapsedMilliseconds}");
            }
        }
    }

    private static ITransactionService GetService()
    {
        return new TransactionServiceProxy();
    }
}    

Circuit Breaker pattern add stability when a system restores after failure and minimize the impact on performance. It allows to detect state change events for monitoring and notify administrators about problems. If you monitor only Open state event, you can significantly limit the number of messages generated.

When to use

For preventing to communicate with a service or shared resources, when the probability of an error is high and these errors have a continuous nature.

When not to use

  1. For communication with private resources - it only adds an overhead to operation
  2. As a business logic exception handling
  1. Retry Pattern
  2. Circuit Breaker Pattern
  3. Entity Framework Optimistic Concurrency Pattern