Best Practice - Distributed Tracing and OverOps

Distributed tracing is a method used to profile and monitor applications, especially those built using a microservices architecture. Distributed tracing helps pinpoint where failures occur and what causes poor performance.

However, distributed tracing is not an easy problem to solve. Some of the challenges include:

  1. A microservice could be invoked multiple times during a business transaction
  2. A microservice could be called in any order. Its extremely tough to install a predefined set of restrictions/conditions before a microservice is invoked.
  3. As data is modified as part of the business flow, tracking the data across the various microservices get tricky.

So how does one troubleshoot this? Where does OverOps fit into this picture?

Before I get into this, I want to highlight an important caveat. The below ‘design practice’ does require code changes.

The simplest way to get visibility into this process would be to use what is referred to as “Baggage”.
Baggage is defined as could be something that is unique that would identify the business transaction over the course of its lifetime. This could be a simple unmutable ID such as “TransactionID” or “VisitorID” or “CustomerID” that could be used by the microservices. If such an unmutable ID is not available, an alternative would be to create a unique “Baggage ID” that could be used across these services.

To troubleshoot effectively, Application teams will now need to follow a best practice of adding this “Baggage ID” to every exception, log error or warning they write into the system.
This would allow us to track the business flow across microservices through this ID.

To take this one step further, Enabling “OverOps Tiny Links” (also known as ARC links) will result in associating a BaggageID with a corresponding Tiny Link.
For more details on OverOps ARC links see here.

While traversing a transaction across microservices boundaries, OverOps provides multiple advantages

  1. Ability to find out what “What, When and Why” a distributed transaction breaks -
    By associating a “Baggage ID” with a tiny link, OverOps provides you details on the exact microservice, version, line of source code and variables associated with the problem.
  2. Visibility across individual micro-services of the business transaction even if it doesn’t fail -
    This can be achieved by using log.warning for visibility purposes. OverOps will take a snapshot of all warnings that gives additional troubleshooting context even if the business transaction doesn’t fail.

For more information, please contact