Circuit Breakers

Every building today has one, you’ve probably already seen one. It is essential for your security and to prevent your electronics don’t break.

Circuit breakers are designed to prevent your electrical circuit from damage caused by too much current flowing through it, it basically switches automatically to interrupt the current flow until somebody can fit it.

It is famous at wall street too, and can be called “trading curb”, it’s used to prevent dramatic losses and speculative gains, when the market fall or rises a lot in a small data frame, it opens and stops trading.

Circuit breakers are a common pattern in distributed systems too, it was described by Michael T. Nygard in his famous book Release It.

Today most of the computer systems can be considered distributed systems, or at least make one request to external services, the way of communicating can vary but 90% of our systems are exchanging information across the internet.

In a constant seek for more and more reliable and stable systems, engineers need to be careful about everything that can go wrong, the internet isn’t 100% trusty and when you’re making requests between two points using it, you need to keep in mind that a lot of things can go wrong, as we learned with the famous “Fallacies of distributed computing” for example, and that’s why integration system can be considered an antipattern of stability.

In this context, today, circuit breakers became a popular choice to handle HTTP errors, but it can be used to handle critical operations too.

What will occur with your system if another system or operation starts to fail? You’ll start to have a cascading failure “A cascading failure occurs when an error in one system affects others, with the initial failure walking down into your systems layer causing other errors.”.

Cascading Failure spanding across services.

The algorithm is simple and short, it has two main states “open” and “closed”, and we start it closed (the normal state) when the circuit is closed everything is working as expected, if one error occurs during the execution of the handled operation the circuit starts to track the number of errors, if the number the errors at a certain time crosses the threshold it will open.

The second main state is “open”, the operation will not be executed because a sequence of N errors in X time has made the circuit open.

After some time (predefined too) on the open state, the circuit changes to a “transition state” called half-open. When it is in this state, the circuit executes the next call to the handled operation, if the execution success the circuit goes to closed state again, if the execution fails it back to open state and wait for the next change to half-open.

Obviously in some cases only one successful request isn’t good to switch the state to closed again, and it can be configured too.

Circuit breaker states

Technical Details

Exceptions or response structures

If you search on the internet, you’ll find a lot of different ways to implement circuit breakers, some people following the suggestion of the Release It! book and other ways.

The first thing that you need to pay attention is how you’ll know if the operation successes or failed, you have two great options here, you can control it by exceptions or by the response.

When handling with exceptions you have the advantage that it’s easy to start using circuit breaker because exceptions are already present in a lot of languages, in case of your own operations you only need to raise an exception when something goes wrong, when dealing with 3rd party libs, almost all will raise exceptions. Particularly I don’t like this approach, I know that some people may like to control flow based on exception but I don’t consider it a good way, and it’s important to remember that some languages like rust doesn’t have exceptions.

You can know if your operation has failed or not based on its response, it looks much more smoothie, readable, and simple. But what’s the operation needs to return as its “response”? The operation can return any structure that contains information if the operation successes, for example, if you are working with an object-oriented language you only need to return any object that responds to a message like success?, it will be much more “object-oriented”.

class Response
  def success?

This approach of return response structures has become popular in the last years, “hyped” languages encourage it, for example, go lang have its built-in error the type that is an interface, it’s common to an operation return it’s the result and an error (if any), for example:

f, err := os.Open("filename.ext")
if err != nil {

Rust has it’s result type too, called Result<T, E>, if you’re dealing with functional languages, in Elixir is common to return a tuple with the first item being the return status of the function that you can pattern match against:

{:ok, %{"age" => 22, "name" => "Otavio Valadares"}}

For these reasons, I think the best way to handle your responses when dealing with circuit breakers (and any operation) maybe with response structures.

Track errors on 3rd party systems or memory

Another thing that you need to deal with is how you’ll handle the error tracking, the first obvious option is to trust in 3rd party service like a redis database, you’ll only need a key to increment with TTL. But with this approach you’ll create a shared resource, that is an antipattern of stability too, if your redis goes down your entire application will go down too? You’ll deal with the famous Quis custodiet ipsos custodes? because your connection with redis will be not faulted tolerant.

Shared Resource

The second option is to save this error tracking in memory, you have some kind of structure that stores the count and the timing, but when working with an application handling a lot of operations you can have memory problems storing this structures in memory, I know that for 95% of the cases this is not a problem today, but if you’re dealing with low memory applications it may be a problem.

In some cases, you may want to use the Redis way for some reason, in this case, I recommend using a memory circuit breaker to watch the redis connection, in other cases I think in memory track the best way to solve this problem.


Observability is another important thing when working with a circuit breaker too, if your circuit opens, it can save your application for some errors, but the true magic stands when this information can be used by your stakeholders and by other applications to change its behavior automatically, for example, if your circuit that handles your 3rd payment partner closes you can hide your payment tab in the mobile app.

You need to provide the status of your circuits (or a group of them, like, all circuit that handles some 3rd partner or critical operation when saving an invoice) somewhere, it can be simple as providing it in your healthcheck route, but when providing it in an endpoint it can lead to some problem if another application needs to check your circuit breaker status every time that will do an operation.

A good approach is to put a notification message at your favorite message broken, if an application is interested in this message it reads and takes its own decisions. Another solution that can be considered is building a “circuit breaker control pane” an application that knows the state of all circuit breakers of your company, but it will lead you to a great bus factor issue.

It’s important to show the circuit breaker switching is states and actual states for human too, and will be good to put it in your Grafana, in some cases a bot that notifies your Slack channel, and integrates it with your alarm system like OpsGenie.

Service Mesh

Code a circuit breaker logic for all applications can be frustrating and even if you’re using a library it can be boring to install it and set up in every application, thinking about that all that boring stuff about repeated logic in the application level, service mesh was created and one thing that almost all service mesh system has in its sidecars is the circuit breaker, if you’re using any service mesh, you don’t need to code it at the application level.

But I know that service mesh technologies is not a reality for a lot of companies today, and start putting your circuit breakers at code level can be a good way to start using it.


The circuit breaker is a good pattern that can bring to your applications an improvement in its stability. It’s worth to start using it and launch a stability culture at your companies if it hasn’t already.

Final thought

If you have any questions that I can help you with, please ask! Send an email (, pm me on my Twitter or comment on this post!

Follow my blog to get notified every new post:

Newsletter #2 05/2020

If we’ll talk about April 2020 computer posts/papers we need to talk about ACM Digital Library it’s free until June 30, 2020, I’m already panning the library searching for things related to compilers and distributed systems.

This previous month we have an interesting question on r/AskComputerSciente about free research papers and we’ve nice responses about free resources.

The first modern pandemic – This is a long post (20 min read) by Gates talking about COVID-19 pandemic, it has a great introduction talking about growth, the difference between countries, and people’s behavior. But the focus of the post is the innovation needed to beat pandemic, it separates the innovation needed into five categories, treatments, vaccines, testing, contact tracing, and policies for opening up. If you want to read something non-tech I pretty recommend it. This post has a short version too if you want something short.


Things I Wished More Developers Knew About Databases – So far the best post that I read this month, a lot of interesting things about databases.

GitHub is now free for teams – GitHub is now free for teams with unlimited private repos and collaborators.

Comparing HTTP/3 vs. HTTP/2 Performance – HTTP/3 is coming, let’s analyze it’s difference and improvements.

Ask HN: What are your favorite low-coding apps / tools as a developer? – You can find answers with a lot of interesting tools to make your life easier, automatic admins, internal tools, easy apps, and much more. For people that love a prototype and an MVC, it can be valuable.

Free SRE Books – Three free books about SRE by google, including the famous “Site Reliability Engineering” book.


Untangling Microservices, or Balancing Complexity in Distributed Systems – A good post talking about microservices and its meaning.

Crafting “Crafting Interpreters” – Bob Nystrom finished his “Crafting Interpreters” books, for those who don’t know he was writing a book about interpreters for the last three years, and completely free. I’m reading this book for the last months and it’s fascinating, it’s simple! I pretty recommend it to everyone that wants to read more about compilers.

Our Government Runs on a 60-Year-Old Coding Language, and Now It’s Falling Apart – Have you noticed that a lot of critical systems are written in COBOL? This post talks about it.

CODE IS ENGINEERING, TYPES ARE SCIENCE – This posts discuss how software engineering is related to the three ways of reasoning, deduction, abduction, and induction.

History of Erlang and Elixir – Great post with a resumed history of Erlang, it doesn’t take a long time to read and it’s worth reading.

An Exploratory Guide to the Service Mesh Platforms – An overview of the main service mesh technologies that we have today.

Graphs – This is not a new post but can be useful for whose don’t know graphs very well, graphs are core on computation and know it can be useful.

Refactoring a Function in Elixir – A simple step-by-step refactoring post, showing how to refactor a function in Elixir, applying good concepts.

Books I recommend to my software engineering students – This post recommends six books that can help you became a better software engineering, and no one is about programming.

Conversations with a six-year-old on functional programming – This post was trending on hacker news and gained my attention, it can be interesting even for those who don’t know functional programming.

A Possible New Backend for Rust – Nice and technical post discussing a new back-end for Rust Lang based on Cranelift that is under development. It can improve the build time of Rust when you’re developing.

Ruby Concurrency Final Report – Thread scheduler for lightweight concurrency and a brief about concurrency in Ruby ecosystem.

Microservices Tradeoffs – I love microservices and I advocate it, but for a one people project it’s worth? Let’s see some tradeoffs.

The Computer Scientist Who Can’t Stop Telling Stories – Donald Knuth is a computer science legend if with his famous work on The Art of Computer Programming, this interview with him is interesting, and I recommend it.

Ask HN: How to rediscover the joy of programming? – Are you tired of programming? Let’s discuss it.

Tutorials / Courses

CS 241 – System Programming – Great material about system programming, if you love low-level programming it can be useful.

The Power Of Prolog – Prolog is a interesting language (I have an excellent post about it on my blog), this online book sounds goo for those who wants to learn more about it.

Build an operating system in Rust programming language – Build an operating system using Rust lang, it sounds interesting to learn how SO works behind the scenes

Using Broadway and RabbitMQ to Create a Data Pipeline in Elixir – Broadway has gained my attention on the past months, its a good way to build pipelines and workers on top on Elixir’s GenServer.

Understanding bytes in Go by building a TCP protocol – I always think that someday I’ll try to re-implement a protocol based on its RFC. This tutorial leads you to build a TCP protocol in Go, very interesting!

Releases / News

The final Python 2 release marks the end of an era – has released the last official version of Python 2, python 2.7.18. Let’s talk a little about everything around python 2 and 3.

Rust 1.43.0 – Rust 1.43.0 released.

Ubuntu 20.04 – The new LTS version of Ubuntu is ready.