Thursday, February 16, 2017

Oracle Linux – forked logging to prevent script execution delay in high speed environments

Logging has always been important when developing code, with the changing way companies operate their IT footprint and move to a more DevOps logging becomes even more important. In a more DevOps oriented IT footprint logging is used to constantly monitor the behavior of the running services in the company and feed the information back to the DevOps team to take action on issues as well as to continuously improve the code.

As DevOps teams tend to consume much more logging than traditional oriented IT organizations the direct effect is that a lot more logging routines are developed and included in the code. While traditionally the number of lines that are written to log files are relatively limited the number of lines written to log files in a DevOps footprint can grow exponentially.

Even though this might not look like a challenge in the first instance it might cause an issue in environments with a lot of services deployed in combination with a high level of logging “steps” and a high level of execution. When developing code and when writing logging where you write to a log file you have to realize that every time you do send a line of logging to the logfile it actually takes cpu cycles, memory interaction, I/O operations to the file system as well as potentially waiting for lock on the file to be able to write to it. All this is resulting in a delay during the execution of your code.

Traditional logging implementation
In “traditional” code and especially in scripting like for example is done in bash scripting, you will see a lot of implementations like shown below. Between start and finish we have two “real” functions that will do something, after the execution of the function in the script flow the script will write some logging to a file.  In first instance this is a perfectly working solution, however, if you execute the script hundreds of times per minute the end 2 end execution of the script might slow down due to the fact that the log writing is developed in an inline manner.

Forked logging 
When developing code, even if you develop a small script in bash, it is good practice to ensure that your main code flow will not have to wait for the lines of logging to be written to file on the file system. In the example below you will see an implementation where the main flow of your code will for a secondary process which will take care of writing the logging to the logfile. 


By using this way of developing your code will execute the two functions that represent an actual step as one process on the operating system in a sequential manner. Every time there is the need to write logging to the logfile a new process is forked. The benefit of this is that your main flow of code will not wait until the “write to logfile” process is completed, it will directly go to the next step in the script.

In cases where you have congestion in writing to the file this implementation will ensure your main process will not have any delay in execution. The forked process that will take care writing to the file will experience the delay however main execution times will improve.

Code example of forked bash processes
If we take the below code as a starting point of the example. We have three main steps in the code, all three basically do the same. In a real world situation step 1 and step 3 would be “real” code execution while step 2 is writing to the log. The sleep commands are included to showcase delay of the system.

#!/bin/bash

sleep 1
echo "step 1 : $(date)" >> ./result.txt

sleep 5
echo "step 2 : $(date)" >> ./result.txt

sleep 1
echo "step 3 : $(date)" >> ./result.txt

As you can see step 1 and step 3 take 1 second to complete while step 2 (writing to the log file) takes 5 seconds. If we run the script and read the content of ./result.txt we will see the following:

[opc@a0d544 test]$
[opc@a0d544 test]$ ./0_example.sh
[opc@a0d544 test]$ cat result.txt
step 1 : Thu Feb 16 17:19:49 EST 2017
step 2 : Thu Feb 16 17:19:54 EST 2017
step 3 : Thu Feb 16 17:19:55 EST 2017 
[opc@a0d544 test]$

The above is perfectly explainable and as expected. However, we want to ensure that the code runs faster, or in other words, we want the main sequential flow to finish faster while we do not really worry that the logging lags a bit behind. In case that the logging is experiencing issues with I/O performance you do not want to wait for this. Also, if your logging would be done by using a curl command to a central REST based logging server you do not want to wait for that.

In the below example we show a form of forking in bash, there are other ways of doing it, however this works for example purposes. The code is shown below:

#!/bin/bash

function step2 () {
 sleep 5
 echo "step 2 : $(date)" >> ./result.txt

}

sleep 1
echo "step 1 : $(date)" >> ./result.txt

( step2 ) &

sleep 1
echo "step 3 : $(date)" >> ./result.txt

As you can see, we have placed step two in a function and we call the function in a bit of a different manner than we would normally do. This has the result that the function is executed in a sub-shell and the main sequential code flow will continue without waiting for the sub-shell to finish the execution of the function.

If we execute this and look at the results we see a somewhat different result than in the first test. We see that the first line is for step one at :49 and the second line is for step three at :08. This is also the moment the script finished. However, at :13 we have step two reporting to the file while the initial script already finished.

[opc@a0d544 test]$ ./1_example.sh
[opc@a0d544 test]$ cat result.txt
step 1 : Thu Feb 16 17:20:08 EST 2017
step 3 : Thu Feb 16 17:20:09 EST 2017
step 2 : Thu Feb 16 17:20:13 EST 2017
[opc@a0d544 test]$

By implementing the calling of step two in a different manner we ensured that the execution of the main sequential flow improved with 5 seconds. Even though you might not expect this kind of delay there are improvements that can be made to the script execution speed when adopting solutions like this.

Some considerations
In cases where time stamping of your logs is extremely critical and/or in cases where a expect congestion in writing to files it is good practice to ensure that the timestamp is taken on the moment that the process is forked and not the moment the line is written to the file. More precisely, the main flow should take the timestamp and hand this over to the forked process in combination with the information that needs to be written to the file.

This way you are sure that the timestamp of the actual occurrence is written to the log and the timestamp when the information was written to the logfile. 

No comments: