The Challenges of Logging Microservices

RisingStack's services:

Sign up to our newsletter!

In this article:

[glossary]

Debugging and monitoring of microservices is not a trivial problem to have but a quite challenging one. I use the word challenging on purpose: there is no silver bullet for this, or a tool that you can install and works like magic, but there are some practices that can help you.

Microservices in a nutshell

In short, the microservice architectural style is an approach to developing a single application as a suite of small services, each running in its own process and communicating with lightweight mechanisms, often an HTTP resource API. – Martin Fowler

You can think of microservices like this:

  • a number of services expose their APIs
  • they communicate with each other and the outside world

A sample microservices topology could look like this:
microservice architecture
Source: Apigee Blog
In the example above there are nine small services communicating with each other, and they expose four interfaces for different applications, for the front end, and for the backend.

What can be a microservice?

A microservice can be anything, that does one thing, but one thing well.
Each Program Does One Thing Well – Unix Philosophy
Examples for microservices are:

  • Authentication service
  • Email sending
  • Image resizing
  • HTTP APIs for given resource types
  • etc..

Communication types

When microservices communicate with each other, the two most common approach is either via HTTP or messages.

Synchronous via HTTP

Microservices can expose HTTP endpoints so that other services can use their services.
But why HTTP? HTTP is the de facto, standard way of information exchange – every language has some HTTP client (yes, you can write your microservices using different languages). We have the toolset to scale it, no need to reinvent the wheel. Have I mentioned, that it is stateless as well?

Asynchronous via queues/messages

Another way for microservices to communicate with each other is to use messaging queues like RabbitMQ or ZeroMQ. This way of communication is extremely useful when talking about long-running worker tasks or mass processing. A good example of this is sending massive amount of emails – when an email has to be sent out it will be put into a queue, and the email microservice will process them and send them out.

Debug challenges

If you move from a monolithic application to microservices, one of the biggest challenges you will face is the lack of stack traces through services.
What happens, if one of your microservices in your infrastructure starts throwing stack traces? Would not it be great if you could trace it back to the origin of that request and play it back to see what happened? To be able to do that you have to assign a Request ID to each of your requests and log them. As your microservices should be stateless, if you record everything it should be easy to playback the whole request through your infrastructure.
This approach solves another problem as well: you can have your services implemented in as many programming languages as you would like, you will still have this playback ability.

Logging challenges

So far so good – you are logging your requests with IDs but still: you have to interpret it in some way. To do so you have to push your logs to a centralized logging application, like Logstash.
Once you have that you can make that searchable and show the results in a nice and easily understandable way using Elasticsearch and Kibana – in short the ELK stack.
Also, instead of setting up your own cluster of services you can choose to go with Loggly. Almost every language has a client already written and ready to be used with Loggly – and not just the clients, but plugins as well for the most common logging libraries. (like winston-loggly of bunya-loggly for Node.js)
ELK
Still, Kibana is not the best tool to display throughput. Wouldn’t it be better to see something like this?
trace topology
This shows five microservices (Users, Tags, Products, Locations, Categories) communicating with each other – the thicker the link the bigger the throughput is. The colors are for showing response times for each service – green means latency is under a given threshold.

Performance – how to detect bottlenecks?

Detecting bottlenecks is not a not an easy one. Let’s say you have a request that takes 106ms to complete – where to look for clues?
Tools like New Relic made things easier, but they are not the best fit for microservices. When I take a look at a diagram, I want to see instantly what is taking so long.
One thing that can help is to be able to inspect individual transactions and see what is going on. The image below shows Google’s Cloud Trace in action, showing how the 106ms adds up for the /add_point endpoint. Basically, Cloud Trace provides distributed stack traces.
Google Cloud Trace
Sadly, it is only available in the Google Cloud for RPCs.

Alerting

If something goes wrong, you have be notified instantly – as we already talked about using Logstash, it is common sense to tunnel these events into Riemann as well. If you don’t know Riemann, Riemann aggregates events from your servers and applications with a powerful stream processing language.

In Reimann, you can set alerts, and you can send them to PagerDuty to create an alert for you. Then in PagerDuty you can ask for alerts like Push Notifications, SMS or even Phone calls.

Solve them all!

At RisingStack, we faced all these issues, so we decided to build a tool that saves you a lot of time in your own infrastructure. This will be open sourced soon, firstly supporting applications built with Node.js, but support for other languages is in the pipeline as well. With this tool, you will have the ability to take a high-level view of your infrastructure and see the throughputs for each microservice as well as the latencies. Also, you will be able to trace individual requests as well through microservices – distributed tracing.
Until that, check out the Top 5 Advantages of the Microservices Pattern for more!

Share this post

Share on twitter
Twitter
Share on facebook
Facebook
Share on linkedin
LinkedIn
Share on reddit
Reddit