Notes about logs




Introduction

Sooner or later, the work on information systems raises the subject of logs, which, in turn, raises the subject of data exchanged within the system.

Often, if it is not always, startups start building their own information systems. And, for various raisons, they don't pay attention to logs. By the time they realize the importance of the subject, it could be too late : their developers strive to keep the system running, but they become less and less successful. They need visibility over the system. They would like to review all the logging strategy. But this is an enormous amount of work and they don't have time. The company is struggling to keep its head above the water. There is no time, and no money, to invest into refactoring.

The architecture may not be adapted to the environmental's constraints. Or may be, some parts of the system are poorly implemented. But, in order to diagnose the causes of the problem, we need visibility. We need information about the system. Logs may be the best way to provide information. Good logs can provide valuable information for planning efficient intervention.

How to design good logs ?

In order to answer this question we should ask other questions :

When do we need to consult the logs ?

When we need information about the system’s behavior. Typically : something went wrong, and we need to find out why. Or we want to get synthetic data about the system in order to classify some priorities. That is, we need to consult the logs any time we need information about the system’s behavior. Therefore, the next question is :

What defines the global behavior of a system ?

The system’s global behavior results from :

Therefore, before we can give an answer to the question « How to design good logs ? » we need to describe the interactions within the information system and the interaction between the system and the outside world.

This study will allow us to describe the « main content » of the logs’ messages.

So let’s talk about systems.

What makes a system behave the way it does ?

Often, if it is not always, when we start talking about logging system, developers adopt the paradigm that characterise the information system they are working on. For example, if their information system is based upon the WOA paradigm (Web-oriented architecture), then they use the expressions "WEB services", requests...

That's all right. But there are many paradigms. For example, WOA is just one declination of SOA (Service Oriented Architecture) among others. And, if you take some distance from the information system's point of view, you realize that SOA is just a projection of some more general concepts.

So let's take some distance with the SOA paradigm. Let's try to formulate the issue regardless of a specific application.

The general issue is :

We have a system.

An observer located outside the system only see the entry points and the exit points. He has no knowledge about the « functional units ».

Let’s review the concepts expressed earlier. The global behavior of a system results from interactions:

The following image resumes and illustrates the concepts previously introduced.

This schema represents a system composed of 9 functional units (FU1, FU2…). The system interacts with 4 other systems: « Client », « Big Client », « Google » and « Yahoo ».

Next, we will focus on data needed to manage the log messages.

Message management : the implication
Assumptions

First, let's make some assumptions :

The first assumption is inevitable : in large information systems it is impossible to guarantee that logs are saved in the order in which they are issued. This assumption implies that we need to include into each message data that will be used to reconstruct the flow of messages as it should (chronologically) be.

The second assumption implies that redundancy is not necessary in order to ensure resilience : we don't have to inject the same data into distinct messages in order to ensure that the information is correctly saved, despite the loss of a certain proportion of messages.

Messages structures

That said, here is the data that should be included in the messages :

The following JSON structure presents the message’ structure :

{
   "timestamp":...,
   "actionId":"...",
   "requestId":"...",
   "order":...,
   "environnement":"...",
   "type":"...",
   "body":{
      ...
      }
   }
}
Bodies for « behavior related messages »
Type REQUEST

The body of a message of type REQUEST must contain:

Type RESPONSE

The body of a message of type RESPONSE must contain:

Type I-ACTION and IA-ACTION (incoming actions)

The body of a message of type « (synchronous or asynchronous) incoming action » (I-ACTION, IA-ACTION) must contain:

Type O-REACTION or O-ACK (reactions to incoming action)

The body of a message of type « outgoing reaction » (O-REACTION) or « outgoing acknowledgement » (O-ACK) must contain:

Type O-ACTION or OA-ACTION (outgoing actions)

The body of a message of type « (synchronous or asynchronous) outgoing action » (O-ACTION, OA-ACTION) must contain:

Type I-REACTION or I-ACK (reactions to incoming actions)

The body of a message of type « incoming reaction » (I-REACTION) or « incoming acknowledgement » (I-ACK) must contain:

Bodies for « informative messages »

The body for all types of informative messages depends on the information within the message. There is no predefined structure to describe this information.

"body":{
          ...
}
Summery
I-ACTION (incoming synchronous action) -> O-REACTION
{
   "timestamp":1419767441,
   "actionId":"randomValue",
   "requestId":"",
   "order":1,
   "environnement":"Paris-1",
   "type":"I-ACTION",
   "body":{
      "id":"myClient",
      "data":{

      }
   }
}
O-REACTION (outgoing reaction) <- I-ACTION
{
   "timestamp":1419767000,
   "actionId":"randomValue", 
   "requestId":"",
   "order":1000,
   "environnement":"Paris-1",
   "type":"O-REACTION",
   "body":{
      "status":"SUCCESS",
      "data":{
          ...
      }
   }
}
IA-ACTION (incoming asynchronous action) -> O-ACK
{
   "timestamp":1419767441,
   "actionId":"randomValue",
   "requestId":"",
   "order":1,
   "environnement":"Paris-1",
   "type":"IA-ACTION",
   "body":{
      "id":"myClient",
      "data":{
          ...
      }
   }
}
O-ACK (outgoing acknowledgement) <- IA-ACTION
{
   "timestamp":1419767000,
   "actionId":"randomValue", 
   "requestId":"",
   "order":1000,
   "environnement":"Paris-1",
   "type":"O-ACK",
   "body":{
      "status":"SUCCESS",
      "data":{
          ...
      }
   }
}
O-ACTION (outgoing synchronous action) -> I-REACTION
{
   "timestamp":1419767450,
   "actionId":"randomValue",
   "requestId":"theRequestId",
   "order":10,
   "environnement":"Paris-3",
   "type":"O-ACTION",
   "body":{
      "actionId":"theOutGoingActionId",
      "addressee":"Client",
      "data":{
          ...
      }
   }
}
I-REACTION (incoming reaction) <- O-ACTION
{
   "timestamp":1419767480,
   "actionId":"randomValue",
   "requestId":"theRequestId",
   "order":10,
   "environnement":"Paris-3",
   "type":"I-REACTION",
   "body":{
      "actionId":"theOutGoingActionId",
      "status":"SUCCESS",
      "data":{
          ...
      }
   }
}
OA-ACTION (outgoing asynchronous action) -> I-ACK
{
   "timestamp":1419767450,
   "actionId":"randomValue",
   "requestId":"theRequestId",
   "order":10,
   "environnement":"Paris-3",
   "type":"OA-ACTION",
   "body":{
      "actionId":"the-OA-ActionId"
      "addressee":"Google",
      "data":{
          ...
      }
   }
}
I-ACK (incoming acknowledgement) <- OA-ACTION
{
   "timestamp":1419767480,
   "actionId":"randomValue",
   "requestId":"theRequestId",
   "order":10,
   "environnement":"Paris-3",
   "type":"I-ACK",
   "body":{
      "actionId":"the-OA-ActionId",
      "status":"SUCCESS",
      "data":{
          ...
      }
   }
}
REQUEST (request) -> RESPONSE
{
   "timestamp":1419767700,
   "actionId":"randomValue",
   "requestId":"see note",
   "order":1,
   "environnement":"Paris-1",
   "type":"REQUEST",
   "body":{
      "requestId":"theRequestId",
      "addressee":"calledFU",
      "data":{
          ...
      }
   }
}

Note about the key « message.requestId » Please note that for messages issued by the entry point that was the target of an « (synchronous or asynchronous) incoming action » to the system, this value is not defined. For messages issued by any other of other interface (exit points or internal interfaces), then this value is not empty. In the latter, the key’s value is the ID of the request (between functional units) that triggered the actual request.

RESPONSE (response) <- REQUEST
{
   "timestamp":1419767484,
   "actionId":"randomValue",
   "requestId":"theRequestId",
   "order":10,
   "environnement":"Paris-3",
   "type":"RESPONSE",
   "body":{
      "requestId":"theRequestId",
      "status":"SUCCESS",
      "data":{
          ...
      }
   }
}