Microservices Tradeoffs and Design Patterns

Let aside the reason why we should and should not jump into Microservices from previous post , here we talk more about what Tradeoffs of Microservices and Design Patterns that are born to deal with them.

Building Microservices is not easy like installing some packages into your current system. Actually you will install a lot of things :). The beauty of Microservices lies on the separation of services that enable each module to be developed independently and keep each module simple. But that separation also is the cause of new problems.

More I/O operations ?

First issue that we can easily to recognize is the emerging of I/O calls between separated services. It exactly looks like when we integrate our system to 3rd party services, but this time, all that 3rd party services is just out internal ones. To have correct API calls, there will be efforts to document and synchronize knowledge between teams handling different services.

But here is the bigger problem, if every services has to keep a list of another services addresses (to call their APIs), they become tight coupled, means strong dependent between each other and it destroys the promised scalability of Microservices. So it is when the Event-Driven style comes to rescue.

Event Driven Design Pattern

Example tools : RabbitMQ, ActiveMQ, Apache Kafka, Apache Pulsar, and more

Main idea with this pattern is to allow services not need to know about each others addresses. Each service just need to know an event pipe, or a message broker and entrust it for distributing its message and feeding back data from other services. There will be no direct API call between services. Each services only fires some events to the pipe, and listen on some events happened from the pipe.

Along with this design pattern, the mindset on how to storing data is required some escalations too. We will not only store STATE of entities, but also store the stream of EVENTs that construct that STATE. This storing strategy also is very effective when dealing with concurrent modifications on the same entity that can cause inconsistent in data. There are 2 approaches to store and consume events : by using the Queue and using the Log that we will discover in later topics.

More Complex Query Mechanism ?

It is obviously there will be moments that we need to query some data that need the co-operation between multiple services. In the past with monoliths style, when all data of all services is located in the same database, writing an SQL query is simple. But in Microservices style, it can’t. Each service secures its own database as a recommended practice. We suddenly can’t JOIN tables, we lost the out-of-the-box rollback mechanism from database’s Transaction feature in case of something wrong with storing data, we may have a longer delay while each service may have to wait for data from other services. And those obstacles turn Event Driven to be a “must have” design for Microservices system since that design is the foundation to support patterns solving this Querying issue, most common are Event Sourcing, CRSQ, and Saga.

Event Sourcing

It can be a bit confusing between terms Event Driven vs Event Sourcing. Event Driven is about communication mechanism between services , since Event Sourcing is about coding solution inside each service to retrieve a state of an entity: instead of fetching the entity from the database, we reconstruct it from an event stream. The event stream can be stored in many ways: it can be stored on a database’s table, or it can be read from Event-Driven supported components such as Apache Kafka, or RabbitMQ, or using some dedicated event stream database like EventStore, etc. This method brings new responsibility to developers that they will have to create and maintain the reconstructing algorithms for each type of entity .

As mentioned at previous section, this strategy is helpful when dealing with concurrent data modification scenario, something like collaboration features that can be seen in Google Docs or Google Sheets, or simply to deal with scenario that 2 user hit “Save” on the same form at very closed moments. But this reconstructing way is not so friendly to a more complex query which is so natural with traditional database like Oracle or PostgresSQL, the SELECT * WHERE ones. So, to cover this drawbacks, each service usually also maintain a traditional database to store states of entities and using it for querying. And this combination form a new pattern called : CQRS (Command and Query Responsibility Segregation) where the read and the write on an entity happens on different databases.

CQRS (Command and Query Responsibility Segregation)

As mention above, this pattern is to separates read and update operations for a data store. A service can use Event Sourcing technique for update an entity, or construct an memory based database such as H2 database to quickly store updates on entities, while as quick as possible to persist the calculated states of entities back to a SQL database for example. This pattern prevents the data conflict while there are many updates on a single entity come at the same time while also keep a flexible interface for query data.

This pattern is effective for scaling purpose since we can scale the read database and the write database independently, and fit for high load scenario when the writing requests can complete quicker because it reduces calls to database with potential delay from locking mechanism inside databases. Quicker response mean there will be more room for other requests, especially in thread-based server technology such as Servlet or Spring.

A drawback of this pattern is the coding complexity. There is more components join in the process, there will be more problem to handle. So it is not recommended to use this way in cases that the domain or business logics are simple. Simple features is nice fit with traditional CRUD method Overusing anythings is not good. I also want to remind that if the whole system does not have special needs on the load, or write-heavy features, it is not recommended to switch to Microservices too. (reason is here )

Saga

Saga means a long heroic story. And the story about Transaction inside Microservices is truly heroic and long. Transaction is an important feature for a database that aim to maintain the data consistency, it prevents partial failure when updating entities. With distributed services, we are having distributed Transactions. Now, the mission is how to co-ordinate those separated Transactions to regain attributes of a single Transaction : ACID (atomicity, consistency, isolation, durability) over distributed services . We can understand simply that : Saga is a design pattern aim to form the Transaction for Microservices.

Saga patterns is about what system must do if there is a failure inside a service. It should somehow reverse some previous successful operations to maintain data consistency. And the simplest way is to send out messages to ask some services to rollback some updates. To make a Saga, developers may have to anticipate a lot scenarios that an operation can fail. The more high level solution for rollback mechanism is to implement some techniques like Semantic lock or Versioning entity. We can discuss about this in other topics. But the point here is it also brings much complexities to the source code. The recommendation is to divide services well to avoid writing too much Saga. If there are some services that are tight coupled, we should think about merging them into one Monoliths service again. Saga is less suitable for tight coupled transaction.

More Deployment Effort ?

Back to Monoliths realm, the deployment means running a few command lines to build an API instances and to build a client side application. When go with Microservices, obviously we are having more than 1 instance, and we need to deploy each instance, one by one.

To reduce this effort, we can use some CI/CD tools such as Jenkins, or some available Cloud base CI/CD out there. We also can write ourself tools , it won’t be difficult. But there is still some more issues than just running command lines.

Log Aggregation

Logging is vital practice when building any kind of application to provide the picture of how system is doing and to troubleshoot issues. Checking logs on separated services can be not very convenient in Microservices so it is recommended to stream logs to one center. There are many tools dedicated for this purpose nowadays such as GreyLog or Logstash. The most famous stack for collecting, parsing and visualizing for now is ELK which is the combination of ElasticSearch + Logstash + Kibana. The drawback of those available logging technology is it requires a bit much RAM and CPU, mostly to support searching logs. For small projects, preparing a machine that is strong enough to run ELK stack may not very affordable. Logstash requires about 1-2 GB is plenty enough. GreyLog requires ElasticSearch so it also require about 8GB RAM and 4 Cores CPU. ELK is much more than that.

Health Check & Auto restart

Beside Logging, we also must have a way to keep track availability of services. Each service may have its own API /healthcheck that we can have a tool to periodically call to to check whether it’s alive or not. Or we can use proactive monitoring tools such as Monit or Supervisord to monitor ports / processes and configure its behavior when some errors occur, such as sending emails or notifications to the Slack channel.

Beside Heath Check, each service should have auto-restarting ability when something take it down. We can configure for a process to start up whenever the machine is up by adding scripts to /etc/init.d or /etc/systemd for most of Linux server. For processes, we can make use of Docker to automatically bring services up right after it is down. For the machine itself, if we use physical machine, we should enter BIOS and set up Auto-Restart when power is on. If we use Cloud machines, it is no worry.

Those techniques are not only recommended for Microservices but also for any Monoliths system to ensure the availability.

Circuit Breaker

This is for when bad things happen and we have no way to deal with it but accepting. There is always such situation is life. For some reasons, one or many services is down or become so slow due to network issues that it will makes user wait long just for a button click. Most of users are impatient and they will likely to retry the pending action, a lot and you know system can got worser. It is when a Circuit Breaker take action. It’s role is just similar to electric circuit breaker , is to prevent catastrophic cascading failure across system. The circuit breaker pattern allows you to build a fault tolerant and resilient system that can survive gracefully when key services are either unavailable or have high latency.

The Circuit Breaker must be placed between client and actual servers containing services. Circuit Breaker has 2 main states: Closed, Open. The rules among those states are:

  • At Closed state, Circuit Breaker just forward request from clients to behind services.
  • Once Circuit Breaker discovers a fail request or high latency, it change status to Open.
  • In Open state, Circuit Breaker will return errors to client’s requests immediately, so the user acknowledge the failure and it is better than let users wait, and it also reduces the load to the system.
  • Periodically, Circuit Breaker makes retry-call to behind services to check their availability. If behind services is good again, it changes to Closed state, if not it remain Open state.

Luckily we may don’t have to implement this pattern ourself. There are available tools out there such as : Hystrix – a part of Netflix OSS, or Istio – the community one

Service Discovery

As we mentioned at Event Driven section, services inside a Microservices no need to know each own addresses by using an Event channel. But what if the team does not familiar with Event style and decide not to use it, or the services is simple enough to just expose REST APIs only. Using Event Driven is not a must-do, and in this case, how do we solve the addressing problem between services.

When system need to be scaled, there will be more instances for one or many services need to be added, or removed, or just be moved around. To let every services know the address (IP , port ) of others, we need a man in the middle that keep the records about service’s addresses and keep it up to date. This module is called Service Discovery ad usually be used along with Load Balancing modules. We may discuss about this more on other topics.

We also no need to create this component from scratch. There are some tools out there such as : etcd, consul, Apache ZooKeeper. Let’s give a try with them.

Ending

Above is an overview of what we need to know when moving to Microservices. Make sure you google them all before really starting. Each of patterns will have its pros and cons and overcoming solutions that another topics will cover. Thanks for reading !!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s