Wednesday, May 24, 2017

Oracle Linux - capture context switching in Linux

Before we dive into the subject, context switching is normal, the Linux operating system needs context switching to function, no need to worry. The question is, if it is normal why would you like to monitor it? Because, as with everything, normal behavior is accepted however behavior which gets out of bounds will cause an issue. With context switching, you are ok in expecting a certain number of context switching at every given moment in time, however, when the number of context switches get out of hand this can result in slow execution of processes for the users of the system.

Definition of context switching
The definition to context switching given by the Linux Information Project is as follows: “A context switch (also sometimes referred to as a process switch or a task switch) is the switching of the CPU (central processing unit) from one process or thread to another. A process (also sometimes referred to as a task) is an executing (i.e., running) instance of a program. In Linux, threads are lightweight processes that can run in parallel and share an address space (i.e., a range of memory locations) and other resources with their parent processes (i.e., the processes that created them).

A context switch comes with a cost, it takes and capacity to undertake the context switch. Meaning, if you can prevent a context switch this is good and will help in the overall performance of the systems. In effect, context switching comes in two different types, voluntary context switches and non-voluntary context switches.

Voluntary context switches
When running a process can decide to initiate a context switch, if the decision is made by the code itself we talk about a voluntary context switch (voluntary_ctxt_switches). This can be for example that you voluntarily give up your execution time by calling sched_yield or you can put a process to sleep while waiting for some event to happen.

Additionally, a voluntary context switch will happen when your computation completes prior to the allocated timeslice expires.

All acceptable when used in the right manner and when you are aware of the costs of a context switch.

non-voluntary context switches
Next to the voluntary context we have the non-voluntary context switches (nonvoluntary_ctxt_switches). A non-voluntary context switch happens when a process becomes unresponsive, however, it also happens when the task is not completed within the given timeslice. When the task is not completed in the given timeslice the state will be saved and a non-voluntary context switch happens.

Prevent context switching
When trying to develop high performance computing solutions you should try to, at least, be aware of context switching and take it into account. Even better try to minimize the number of voluntary context switches and try to find the cause of every non-voluntary context switch.
As context switching comes with a cost you want to minimize this as much as possible, and when a non-voluntary context switch happens the state needs to be saved and the task is placed back in the scheduler queue needing to wait again for a execution timeslice. This makes the overall performance of your system slow down and the specific code you have written becomes even more slow.

Check proc context switches
When working on Linux, we are using Oracle Linux in this example however this applies for most systems, you can check information on context switches by looking into the status which can be located at /proc/{PID}/status in the below example we check for the voluntarty and non-voluntary context switches of pid 25334.

[root@ce /]#
[root@ce /]# cat /proc/25334/status | grep _ctxt_
voluntary_ctxt_switches: 687
nonvoluntary_ctxt_switches: 208
[root@ce /]#

As you can see the number of voluntary context switches is (at this moment) 687 and the number of non-voluntary context switches is 208. This is a quick and dirty way of determining the number of context switches that a specific PID has had at a specific moment.

Monitor context switches
You can monitor your systems for context switching. Even though you are able to do so, you will need a good case to do it. Even though it provides information on your system in most cases and deployments there is no real need to monitor the number of context switches constantly. Having stated that, there are also a lot of cases where monitoring context switching can be vital for ensuring the health of your server and/or compare nodes in a wide cluster.

A quick and dirty way of monitoring your context switches is by taking a sample. For example you could take a sample of the average number of context switches for all processes on you Linux instance that execute a context switch in the sample timeframe.

The below example script takes a 10 second sample of the context switches and provide the output of only the relevant data for this we use the pidstat command which can be installed by installing the sysstat package which is available on the Oracle Linux YUM repository.

pidstat -w 2 1 | grep Average | grep -v pidstat | sort -n -k4 | awk '{ if ($2 != "PID") print "ctxt sample:" $2" - "  $3 " - " $4 " - "  $5}'

The full example in our case looks like the one below:

[root@ce tmp]# pidstat -w 2 1 | grep Average | grep -v pidstat | sort -n -k4 | awk '{ if ($2 != "PID") print "ctxt sample:" $2" - "  $3 " - " $4 " - "  $5}'
ctxt sample:12 - 0.50 - 0.00 - watchdog/0
ctxt sample:13 - 0.50 - 0.00 - watchdog/1
ctxt sample:15 - 0.50 - 0.00 - ksoftirqd/1
ctxt sample:18 - 3.00 - 0.00 - rcuos/1
ctxt sample:2183 - 1.00 - 0.00 - memcached
ctxt sample:2220 - 1.00 - 0.00 - httpd
ctxt sample:52 - 1.00 - 0.00 - kworker/1:1
ctxt sample:56 - 1.50 - 0.00 - kworker/0:2
ctxt sample:7 - 14.00 - 0.00 - rcu_sched
ctxt sample:9 - 11.50 - 0.00 - rcuos/0
[root@ce tmp]#

to understand the output we have to look at how pidstat normally provides the output. The below is an example of the standard pidstat output:

[root@ce tmp]# pidstat -w 2 1
Linux 4.1.12-61.1.28.el6uek.x86_64 (  05/23/2017  _x86_64_ (2 CPU)

03:24:37 PM       PID   cswch/s nvcswch/s  Command
03:24:39 PM         3      0.50      0.00  ksoftirqd/0
03:24:39 PM         7     14.43      0.00  rcu_sched
03:24:39 PM         9      9.45      0.00  rcuos/0
03:24:39 PM        18      3.98      0.00  rcuos/1
03:24:39 PM        52      1.00      0.00  kworker/1:1
03:24:39 PM        56      1.49      0.00  kworker/0:2
03:24:39 PM      1557      0.50      0.50  pidstat
03:24:39 PM      2183      1.00      0.00  memcached
03:24:39 PM      2220      1.00      0.00  httpd

Average:          PID   cswch/s nvcswch/s  Command
Average:            3      0.50      0.00  ksoftirqd/0
Average:            7     14.43      0.00  rcu_sched
Average:            9      9.45      0.00  rcuos/0
Average:           18      3.98      0.00  rcuos/1
Average:           52      1.00      0.00  kworker/1:1
Average:           56      1.49      0.00  kworker/0:2
Average:         1557      0.50      0.50  pidstat
Average:         2183      1.00      0.00  memcached
Average:         2220      1.00      0.00  httpd
[root@ce tmp]#

As you can see from the “script” we print $2, $3, $4 and $5 for all average data where S2 is not “PID”. This gives us all the clear data. In our case the columns we show are the following:

$2 – the PID
$3 – number of voluntary context switches in the given sample time
$4 – number of non-voluntary context switches in the given sample time
$5 – the command name

How to use the monitor data
Collecting data, collecting sample data via monitoring is great, however when not used it is worthless and has to justify the costs of running the collector. As collecting the number of context switches has a cost you need to make sure you really need the data. A couple of ways you can use the data are described below and given a potential value in your maintenance and support effort.

Case 1 - Node comparison
This can be useful when you want to compare nodes in a wider cluster. Checking the number of context switches will be part of a wider set of checks and taking sample data. The number of context switches can be a good datapoint in the overall comparison of what is happening and what the difference between nodes is.

Case 2 - Version comparison
This can be a good solution in cases where you often have new version (builds / deployments) of code to your systems and want to track subtle changes in behavior of how the systems are working in a subtle manner.

Case 3 – Outlier detection
Outlier detection to detect subtle changes in the way the system is behaving over time. You can couple this to machine learning to detect changes over time. The number of context switches changing over time can be an indicator of a number of things and can be a pointer for a deeper investigation to tune your code.

Case 4 – (auto) scaling
Detecting the number of context switches, in combination with other datapoints can be input for scaling the number of nodes up and down. This in general is coupled with CPU usage, transaction timing and others. Adding context switching as an additional datapoint can be very valuable.

The site reliability engineering way
When applying the above you can adopt this in your SRE (site reliability engineering) strategy as one of the inputs to monitor your systems, automatically detect trends and prevent potential issues and feedback to developers on certain behaviour of the code in real production deployments.

Tuesday, May 09, 2017

Oracle Linux - Installing dtrace

When checking the description of dtrace for Oracle Linux on the Oracle website we can read the following: "DTrace is a comprehensive, advanced tracing tool for troubleshooting systematic problems in real time.  Originally developed for Oracle Solaris and later ported to Oracle Linux, it allows administrators, integrators and developers to dynamically and safely observe live systems for performance issues in both applications and the operating system itself.  DTrace allows you to explore your system to understand how it works, track down problems across many layers of software, and locate the cause of any aberrant behavior.  DTrace gives the operational insights that have long been missing in the data center, such as memory consumption, CPU time or what specific function calls are being made."

Which sounds great, and to be honest using dtrace helps enormously in finding and debugging issues on your Oracle Linux system in cases you need to go one level deeper than you would normally go to find an issue.

Downloading dtrace
If you want to install dtrace one way you can do this is by downloading the files from the oracle website, you can find the two RPM's at this location.

Installing dtrace
when installing dtrace you might run into some dependency issues that are not that obvious to resolve. Firstly they have a dependency on each other. This means you will have to install the RPM files in the right order. You can see this below;

[root@ce vagrant]# rpm -ivh dtrace-utils-0.5.1-3.el6.x86_64.rpm 
error: Failed dependencies:
 cpp is needed by dtrace-utils-0.5.1-3.el6.x86_64
 dtrace-modules-shared-headers is needed by dtrace-utils-0.5.1-3.el6.x86_64
 libdtrace-ctf is needed by dtrace-utils-0.5.1-3.el6.x86_64 is needed by dtrace-utils-0.5.1-3.el6.x86_64 is needed by dtrace-utils-0.5.1-3.el6.x86_64
[root@ce vagrant]# rpm -ivh dtrace-utils-devel-0.5.1-3.el6.x86_64.rpm 
error: Failed dependencies:
 dtrace-modules-shared-headers is needed by dtrace-utils-devel-0.5.1-3.el6.x86_64
 dtrace-utils(x86-64) = 0.5.1-3.el6 is needed by dtrace-utils-devel-0.5.1-3.el6.x86_64
 libdtrace-ctf-devel > 0.4.0 is needed by dtrace-utils-devel-0.5.1-3.el6.x86_64 is needed by dtrace-utils-devel-0.5.1-3.el6.x86_64 is needed by dtrace-utils-devel-0.5.1-3.el6.x86_64 is needed by dtrace-utils-devel-0.5.1-3.el6.x86_64
[root@ce vagrant]# 

As you can see, you also have a number of other dependencies. The most easy way to resolve this is to simply use YUM to install both RPM's from your local machine and leverage the power of YUM to install the rest of the dependencies. For this we will use the yum localinstall dtrace-utils-* command.

Now we can quickly check if dtrace is indeed installed by executing the dtrace command without any specific option. You should see the below on your terminal:

[root@ce vagrant]# dtrace
Usage: dtrace [-32|-64] [-aACeFGhHlqSvVwZ] [-b bufsz] [-c cmd] [-D name[=def]]
 [-I path] [-L path] [-o output] [-p pid] [-s script] [-U name]
 [-x opt[=val]] [-X a|c|s|t]

 [-P provider [[ predicate ] action ]]
 [-m [ provider: ] module [[ predicate ] action ]]
 [-f [[ provider: ] module: ] func [[ predicate ] action ]]
 [-n [[[ provider: ] module: ] func: ] name [[ predicate ] action ]]
 [-i probe-id [[ predicate ] action ]] [ args ... ]

 predicate -> '/' D-expression '/'
    action -> '{' D-statements '}'

 -32 generate 32-bit D programs and ELF files
 -64 generate 64-bit D programs and ELF files

 -a  claim anonymous tracing state
 -A  generate driver.conf(4) directives for anonymous tracing
 -b  set trace buffer size
 -c  run specified command and exit upon its completion
 -C  run cpp(1) preprocessor on script files
 -D  define symbol when invoking preprocessor
 -e  exit after compiling request but prior to enabling probes
 -f  enable or list probes matching the specified function name
 -F  coalesce trace output by function
 -G  generate an ELF file containing embedded dtrace program
 -h  generate a header file with definitions for static probes
 -H  print included files when invoking preprocessor
 -i  enable or list probes matching the specified probe id
 -I  add include directory to preprocessor search path
 -l  list probes matching specified criteria
 -L  add library directory to library search path
 -m  enable or list probes matching the specified module name
 -n  enable or list probes matching the specified probe name
 -o  set output file
 -p  grab specified process-ID and cache its symbol tables
 -P  enable or list probes matching the specified provider name
 -q  set quiet mode (only output explicitly traced data)
 -s  enable or list probes according to the specified D script
 -S  print D compiler intermediate code
 -U  undefine symbol when invoking preprocessor
 -v  set verbose mode (report stability attributes, arguments)
 -V  report DTrace API version
 -w  permit destructive actions
 -x  enable or modify compiler and tracing options
 -X  specify ISO C conformance settings for preprocessor
 -Z  permit probe descriptions that match zero probes
[root@ce vagrant]# 

All ready and set to start with dtrace on your Oracle Linux instance. As an addition, you will also have to install the below mentioned packages for your specific machine:

yum install dtrace-modules-`uname -r`

Oracle Linux - using pstree to find processes

When checking which processes are running on your Oracle Linux instance you can use the ps command. Most likely the ps command is the most likely the most used command to find processes, and for good reasons as it is very easy to use. However, when you want some more insight and a more easy view what is related to what the pstree option can be very useable.

pstree shows running processes as a tree. The tree is rooted at either pid or init if pid is omitted. If a user name is specified, all process trees rooted at processes owned by that user are shown. pstree visually merges identical branches by putting them in square brackets and prefixing them with the repetition count. As an example you can see the below standard output of pstree without any additional options specifief;

[root@ce tmp]#
[root@ce tmp]#  pstree
[root@ce tmp]#
[root@ce tmp]# 

As you can see in the above example httpd is between brackets and 10 is mentioned. Which means that 10 which indicates that more processes are running as httpd. Below is shown a part of the full tree (removed the lower part for readability):

[root@ce tmp]# pstree -p
        │                   ├─{VBoxService}(1186)
        │                   ├─{VBoxService}(1187)
        │                   ├─{VBoxService}(1188)
        │                   ├─{VBoxService}(1189)
        │                   ├─{VBoxService}(1190)
        │                   └─{VBoxService}(1191)
        │             ├─httpd(3617)
        │             ├─httpd(3618)
        │             ├─httpd(3619)
        │             ├─httpd(3620)
        │             ├─httpd(3621)
        │             ├─httpd(3622)
        │             ├─httpd(3623)
        │             ├─httpd(5020)
        │             └─httpd(5120)
        │            ├─{java}(3997)
        │            ├─{java}(3998)

This shows the main process (PID 3612) and all other processes that are forked from this process. Using pstree is primarily (in my opinion) to support you when doing some investigation on a machine and is not by default the best tool to use when scripting solutions on Oracle Linux. Having stated that, it is a great tool to use.

Monday, May 08, 2017

Oracle Linux - get your external IP in bash with

Developing code that helps you in automatic deployments can be a big timesaver. Repeating tasks for installing servers, configuring them and deploying code on them is something which is more and more adopted by enterprises as part of DevOps and continuous integration and continuous deployment methods. When you use scripting for automatic deployment of your code in your own datacenter the beauty is that you fairly well know how the infrastructure looks and you have a fairly good view on how, for example, your machine will be accessible from the outside world. For example, if you deploy a server that has an external IP address on the outside of the network edge you should be able to determine this relatively easy even in cases where this IP is not the IP of your actual machine.

If you however provide scripting which you distribute you will not be able to apply the logic you might apply in your own network. For this you need some way to find out the external IP address. And, as stated, this can be something totally different than the IP which the machine actually has from your local operating system point of view.

The people at have done some great work by providing a quick service to resolve this problem. provide a service that will provide you all the information need (and more) in a manner that is easily included in bash scripting.

As an example, in case you would need your external IP to use in a configuration in your Oracle Linux deployment you could execute the below command:

[root@ce tmp]# curl
[root@ce tmp]#

(do note, this is not my IP as I do not use a google webhost as one of my test machines). As you can see this is relative easy and to provide an example of how you could include this in a bash script you can review the below code snippet:


 myIp=`curl -s`

 echo $myIp

And, even though this is a very quick and easy solution to a problem you could face when you try to automate a number of steps while scripting the provides more options. A number of options to get information from the "external" view are available and can all be found at the website. However most important one is the ability to do a curl to which will return a JSON based response with all the information in it. This makes it parsable. And to make it more easy, Oracle has included jq in the YUM repository which makes parsing JSON even more easy. An example of the JSON response from is shown below (again.... using a fake google webhost and not my own private information.

 "connection": "",
 "ip_addr": "",
 "lang": "",
 "remote_host": "",
 "user_agent": "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.21 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2",
 "charset": "",
 "port": "54944",
 "via": "",
 "forwarded": "",
 "mime": "*/*",
 "keep_alive": "",
 "encoding": ""

Thursday, May 04, 2017

Oracle Linux - find files after yum installation

Installing software on Oracle Linux is relative easy using the yum command. The downside of the ease of installing is that you do have a lot of files being "dumped" on the filesystem without keeping clear track of what is exactly installed where. Keeping your filesystem clean and understanding what ends up in which location is vital for maintaining a good working Linux instance.

A couple of options are available to keep track of what is installed where and provide you with a list of files which shows where things ended up on the filesystem.

First option is making use of the repoquery utility. Where repoquery is part of the yum-utils package you will have to ensure that you have installed repo-utils. Basically you can check this by checking if you have the repoquery utility (which is a good hint) or you can check by using the yum command as shown below:

[root@ce ~]# yum list installed yum-utils
Loaded plugins: security
Installed Packages
yum-utils.noarch                                                                                    1.1.30-40.0.1.el6                                                                                    @public_ol6_latest
[root@ce ~]#

If you have the repoquery utility you an use it to find out which files are installed into which location. An example of this is shown below where we check what is installed and where it is installed when we did the installation of yum-util 

[root@ce ~]# 
[root@ce ~]# repoquery --installed -l yum-utils
[root@ce ~]# 
[root@ce ~]# 

This will help you to keep track of what is installed in which location and can support in ensuring you have a clean system.