Circuit Breakers

Every building today has one, you’ve probably already seen one. It is essential for your security and to prevent your electronics don’t break.

Circuit breakers are designed to prevent your electrical circuit from damage caused by too much current flowing through it, it basically switches automatically to interrupt the current flow until somebody can fit it.

It is famous at wall street too, and can be called “trading curb”, it’s used to prevent dramatic losses and speculative gains, when the market fall or rises a lot in a small data frame, it opens and stops trading.

Circuit breakers are a common pattern in distributed systems too, it was described by Michael T. Nygard in his famous book Release It.

Today most of the computer systems can be considered distributed systems, or at least make one request to external services, the way of communicating can vary but 90% of our systems are exchanging information across the internet.

In a constant seek for more and more reliable and stable systems, engineers need to be careful about everything that can go wrong, the internet isn’t 100% trusty and when you’re making requests between two points using it, you need to keep in mind that a lot of things can go wrong, as we learned with the famous “Fallacies of distributed computing” for example, and that’s why integration system can be considered an antipattern of stability.

In this context, today, circuit breakers became a popular choice to handle HTTP errors, but it can be used to handle critical operations too.

What will occur with your system if another system or operation starts to fail? You’ll start to have a cascading failure “A cascading failure occurs when an error in one system affects others, with the initial failure walking down into your systems layer causing other errors.”.

Cascading Failure spanding across services.

The algorithm is simple and short, it has two main states “open” and “closed”, and we start it closed (the normal state) when the circuit is closed everything is working as expected, if one error occurs during the execution of the handled operation the circuit starts to track the number of errors, if the number the errors at a certain time crosses the threshold it will open.

The second main state is “open”, the operation will not be executed because a sequence of N errors in X time has made the circuit open.

After some time (predefined too) on the open state, the circuit changes to a “transition state” called half-open. When it is in this state, the circuit executes the next call to the handled operation, if the execution success the circuit goes to closed state again, if the execution fails it back to open state and wait for the next change to half-open.

Obviously in some cases only one successful request isn’t good to switch the state to closed again, and it can be configured too.

Circuit breaker states

Technical Details

Exceptions or response structures

If you search on the internet, you’ll find a lot of different ways to implement circuit breakers, some people following the suggestion of the Release It! book and other ways.

The first thing that you need to pay attention is how you’ll know if the operation successes or failed, you have two great options here, you can control it by exceptions or by the response.

When handling with exceptions you have the advantage that it’s easy to start using circuit breaker because exceptions are already present in a lot of languages, in case of your own operations you only need to raise an exception when something goes wrong, when dealing with 3rd party libs, almost all will raise exceptions. Particularly I don’t like this approach, I know that some people may like to control flow based on exception but I don’t consider it a good way, and it’s important to remember that some languages like rust doesn’t have exceptions.

You can know if your operation has failed or not based on its response, it looks much more smoothie, readable, and simple. But what’s the operation needs to return as its “response”? The operation can return any structure that contains information if the operation successes, for example, if you are working with an object-oriented language you only need to return any object that responds to a message like success?, it will be much more “object-oriented”.

class Response
  def success?
    ...
  end
end

This approach of return response structures has become popular in the last years, “hyped” languages encourage it, for example, go lang have its built-in error the type that is an interface, it’s common to an operation return it’s the result and an error (if any), for example:

f, err := os.Open("filename.ext")
if err != nil {
    log.Fatal(err)
}

Rust has it’s result type too, called Result<T, E>, if you’re dealing with functional languages, in Elixir is common to return a tuple with the first item being the return status of the function that you can pattern match against:

{:ok, %{"age" => 22, "name" => "Otavio Valadares"}}

For these reasons, I think the best way to handle your responses when dealing with circuit breakers (and any operation) maybe with response structures.

Track errors on 3rd party systems or memory

Another thing that you need to deal with is how you’ll handle the error tracking, the first obvious option is to trust in 3rd party service like a redis database, you’ll only need a key to increment with TTL. But with this approach you’ll create a shared resource, that is an antipattern of stability too, if your redis goes down your entire application will go down too? You’ll deal with the famous Quis custodiet ipsos custodes? because your connection with redis will be not faulted tolerant.

Shared Resource

The second option is to save this error tracking in memory, you have some kind of structure that stores the count and the timing, but when working with an application handling a lot of operations you can have memory problems storing this structures in memory, I know that for 95% of the cases this is not a problem today, but if you’re dealing with low memory applications it may be a problem.

In some cases, you may want to use the Redis way for some reason, in this case, I recommend using a memory circuit breaker to watch the redis connection, in other cases I think in memory track the best way to solve this problem.

Observability

Observability is another important thing when working with a circuit breaker too, if your circuit opens, it can save your application for some errors, but the true magic stands when this information can be used by your stakeholders and by other applications to change its behavior automatically, for example, if your circuit that handles your 3rd payment partner closes you can hide your payment tab in the mobile app.

You need to provide the status of your circuits (or a group of them, like, all circuit that handles some 3rd partner or critical operation when saving an invoice) somewhere, it can be simple as providing it in your healthcheck route, but when providing it in an endpoint it can lead to some problem if another application needs to check your circuit breaker status every time that will do an operation.

A good approach is to put a notification message at your favorite message broken, if an application is interested in this message it reads and takes its own decisions. Another solution that can be considered is building a “circuit breaker control pane” an application that knows the state of all circuit breakers of your company, but it will lead you to a great bus factor issue.

It’s important to show the circuit breaker switching is states and actual states for human too, and will be good to put it in your Grafana, in some cases a bot that notifies your Slack channel, and integrates it with your alarm system like OpsGenie.

Service Mesh

Code a circuit breaker logic for all applications can be frustrating and even if you’re using a library it can be boring to install it and set up in every application, thinking about that all that boring stuff about repeated logic in the application level, service mesh was created and one thing that almost all service mesh system has in its sidecars is the circuit breaker, if you’re using any service mesh, you don’t need to code it at the application level.

But I know that service mesh technologies is not a reality for a lot of companies today, and start putting your circuit breakers at code level can be a good way to start using it.

Conclusion

The circuit breaker is a good pattern that can bring to your applications an improvement in its stability. It’s worth to start using it and launch a stability culture at your companies if it hasn’t already.

Final thought

If you have any questions that I can help you with, please ask! Send an email (otaviopvaladares@gmail.com), pm me on my Twitter or comment on this post!

Follow my blog to get notified every new post:

A tale about application infrastructure

Today we’re facing the container revolution but I feel like most of the people don’t know why we are using it and what problems existed before it, and the history behind the evolution of the application deployment.

In this post, I talk about the evolution of the infrastructure of the applications, obviously, this is a topic that has size and information to be a book, but I tried to summarize it in few lines to understand the key points and the background quickly.

Physical Server

A long time ago when developing applications, companies usually run their applications on typical physical rack servers, these servers were basically like your computer running an application on top of an operating system.

Physical Server X-ray

It was very common to find small-medium companies that have a small room inside their office with a classic 19inch-rack with a server running one application, large companies usually build buildings called “datacenters” with tons of racks only to host their applications.

It was only possible to scale as we know today as “vertical scaling”, you only need to add more hardware to the next floor of your rack and that’s it.

Physical Server Vertical Scaling

This model has a lot of trouble, the first one is that one server usually runs only one application and if this application doesn’t fully use server resources it may leave unused resources.

It was very expensive too, and not all companies had a budget to buy it, it was a problem for small companies or recent-founded companies.

At the beginning of the internet it worked, but with the growth of the internet, this model was not viable anymore.

Virtual Host

A few years later, RFC 2616 introduced us to HTTP/1.1 and with him, as described in the RFC, the ability to send a “host” header in a request providing the host and port information from the target URI, enabling the origin server to distinguish among resources while servicing requests for multiple hostnames on a single IP address.

This technology is called virtual host, and its most famous application is shared web hosting, with this one server can host multi websites. The price of hosting a website decreased, many businesses started to offer website hosting for a few dollars.

The mechanism is simple, the server analyzes the host header of an incoming request, and on its configuration you configure something like this “request for this site, go to this system path”. With this, you have the same IP address resolving DNS to multiple websites.

Virtual Host Working

At this point host, a web application was easier than bare metal servers but stills to have problems, and it takes us to the next step…

Virtual Machine

Time goes on and technology from lates the 60s started to being used massively, I’m talking about operating system virtualization, this concept allows single hardware to host many operation systems or application, each one with its own operating system and environment while still sharing the same hardware resources.

Virtual Machine X-ray

Using virtualization, a company just needs to buy a server with strong hardware and boot VMs as want (and the hardware support). Another good ability is to build custom OS images with pre-installed system requirements.

This new way of build application infrastructure maximized resource utilization and simplified application architecture, the price for deploy an application decreased, and the most important, the growth of virtualization comes with the first IaaS companies, offering virtual machines allocation with “one-click”, the most famous example is the Amazon Web Services (AWS) with his famous EC2 service.

The period of more growth in IT operations around the world comes at the same time, millions of users using your application was a reality, cloud computing comes and new ways of think about infrastructure comes, microservices architecture comes in response to large applications, and now engineers don’t need to deploy a single monolith application, they need to deploy a lot of small applications.

Fallowed by the giants IT operations and tech companies, DevOps emerged and now companies make dozens of deploys/day, and new requirements on infrastructure show up. Some problems of virtualization come to mind, like that it’s not totally optimized, images are large and sometimes the boot of a new “instance” was slow.

A new technology trend called “containers” comes with the promise of change the way we think about IT infrastructure, and resolve all the problems related to virtual machines.

Containers

Follow my blog to get notified every new post:

While virtual hosts, virtual machines and all that story happened, researchers around the world were working too, and they began to advance on an implementation of an old but gold UNIX system feature called chroot, the OS-Level virtualization forerunner.

With a great time skip, in 2013 a technology called LXC was announced and later turned into the famous Docker Containers. This technology drew the attention of engineers around the world because it solves a lot of virtual machine problems.

Linux containers are a group of processes that are isolated from the rest of the system, think it is like a box isolated from the world, inside this box, you can put your application and all its dependencies and it will run isolated from the rest of the operating system, but using the same kernel of other processes.

Image illustrating putting application dependencies inside container

But now you can ask me, what the difference between Linux containers and virtual machines? The difference is simple and powerful.

Difference between containers and virtual machines

Linux containers provide process-level virtualization and don’t need to emulate the hole OS like virtual machines, they share the same kernel with the host OS (you can see on the image that Linux containers don’t boot an entire OS to work), its made containers lightweight, they can boot in milliseconds (VM usually needs minutes to boot), the containers images are smaller than VM images, containers have better performance than virtual machines and they are very secure, because each container and its process are isolated from the rest of the system.

Containers are great, not only to infrastructure but it changed the way of developing too, now you don’t have the famous problem of “works on my machine” anymore, you can code your application locally using containers and the environment that your application is running on your machine, will be the same that it will run on your infrastructure.

Engineers started to deploy their applications using containers (and develop too), probably using the famous Docker containers, but it wasn’t using 100% of the potential of containers, it seemed that something was missing and thankfully the missing “something great” didn’t take long to appear.

Container Orchestration

It was the missing thing to use 100% of Linux containers’ power, a simple and beautiful way of thinking about infrastructure and application management, now the way of think about it is totally different from the beginning when you only think about a simple server running your application.

Let’s think about its concept, you have something that we usually call “operator”, the operator it’s like a big brother and is watching everybody inside its cluster, the cluster is composed of N nodes that usually are different machines and inside each machine, we have N Linux containers (Here’s the magic!).

As the name already says, the operator is operating our cluster, and he knows something that is like a recipe that the developer writes to tell orchestrator about how the application should behave when deployed.

This recipe tells standard things (and sometimes peculiarities of the chosen orchestration technology), like:

  • Number of containers that your application will use (number of replicas)
  • Memory and CPU reservation
  • Healthchecks
  • Horizontal/Vertical scaling rules
Orchestrator and its nodes running containers]

The image is illustrating a cluster of four nodes, each one running N containers and the central orchestrator.

When orchestrator starts to do its job, your infrastructure will have some kind of life, it will behave like an organism, it will boot new containers, kill old containers, replace unhealthy containers, scale your application based on metrics, share traffic between your N replicas, your application will have resilience and performance, a universe of possibilities will exist in your infrastructure based on this basic concept that uses Linux containers with clustering, load balancing, and metrics.

Following this concept, a lot of technologies have emerged and become popular to orchestrate containers, like, ECS, Docker Swarm, and the most hyped, Kubernetes (K8s). Each one has its own peculiarities and properties (Kubernetes being the most complex of them).

Summarizing, container orchestration is the automation of all aspects of coordinating and managing your containers, it manages the lifecycle, scaling, redundancy and much more for you.

Today, millions of people are using your software and you need to think about resiliency, scalability, monitoring and much more, container orchestration solves a lot of problems related to it, but it’s hard to get it working, it is heavy coupled with clustering and load balancing and other concepts that have grown together technologies described in this text.

What’s next

We’ve talked about some important steps in the evolution of applications infrastructure until we arrive today’s powerful container orchestration, that allowed us to deal with today’s problems, but what’s next? What’re the next problems we will face when developing? What technology will solve these problems? These questions only the time will answer but the lesson we take is to always be studying and adapting to new tendencies.

Final thought

If you have any questions that I can help you with, please ask! Send an email (otaviopvaladares@gmail.com), pm me on my Twitter or comment on this post!

Follow my blog to get notified every new post:

Reference

https://devopedia.org/container-orchestration https://avinetworks.com/glossary/container-orchestration/ https://www.hpe.com/us/en/what-is/container-orchestration.html https://geekflare.com/docker-vs-virtual-machine/

What’s the Diff: VMs vs. Containers
https://www.redhat.com/en/topics/containers/whats-a-linux-container https://blog.britesnow.com/understanding-kubernetes-value-867c163d5ed2 https://www.toptal.com/kubernetes/what-is-kubernetes https://www.infoworld.com/article/3268073/what-is-kubernetes-your-next-application-platform.html https://blog.britesnow.com/understanding-kubernetes-value-867c163d5ed2 https://en.wikipedia.org/wiki/Multitier_architecture https://www.quora.com/What-is-the-difference-between-virtual-hosts-and-virtual-servers
The Definitive Guide to Bare Metal Servers for 2021
https://www.techradar.com/news/the-rise-and-rise-of-bare-metal-servers https://www.google.com.br/search?q=single-tenant+servers&oq=single-tenant+servers&aqs=chrome..69i57&sourceid=chrome&ie=UTF-8 https://en.wikipedia.org/wiki/Bare-metal_server https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol https://en.wikipedia.org/wiki/Data_center http://www.iasl.com/solutions/virtual-servers/vmfaq https://www.atlantech.net/blog/virtual-servers-vs.-physical-servers-which-is-best
Physical Servers vs. Virtual Machines: Key Differences and Similarities
https://en.wikipedia.org/wiki/Virtual_hosting#cite_note-1 https://www.idkrtm.com/history-of-virtualization/ https://searchservervirtualization.techtarget.com/feature/Whats-the-difference-between-Type-1-and-Type-2-hypervisors https://en.wikipedia.org/wiki/Shared_web_hosting_service https://www.redhat.com/en/topics/containers/whats-a-linux-container