Johan Louwers - Tech blog: September 2012

Sunday, September 23, 2012

Oracle needs more Linux developers

In a recent interview with internetnews.com the Oracle VP of Linux development Wim Coekaerts is going into a statement done by Linus Torvalds. At LinuxCon USA Linus Torvalds stated that the Linux community is lacking kernel developers. In the interview with internetnews.com Wim Coekaerts is stating this is partially true however go's into detail on how this has become the situation and is offering some solutions.

What Wim Coekaerts states that there are enough people who would like to contribute to the Linux community however are turned down by the inner workings of the community. As an example he states that if a new developer is providing a patch to the kernel mailing lists and it is not done to the letter of how it should be done the newly contributing developer is fried by the community. This is discouraging in such a way that many developers are so turned down by there first experience of providing a patch to a Linux issue that the will not send in a second patch. I do think there is not a true track record and some hard figures on this however I can imagine that Wim is correct in his statement.

In the interview he promotes that the Linux community should start a channel for new kernel developers where they can ask questions and are promoted to become more knowledgeable Linux developers. I do very much promote this idea however, I also do have a feeling that this is a spot where Oracle and the Linux development team from Wim Coekaerts could take a role. Oracle is promoting more and more the use of Oracle Linux and is depending for a large part of community build software. They do build there own patches and "pay" the community back in this way. However, if Oracle feels strongly about open source software development and the development of Linux they could take a role in this. Oracle could take a role in setting up a kernel developers learning and training channel which should be available to everyone. This way the Linux development community benefits and Oracle will have the benefit of (A) more good developers adding to the kernel and (B) they have the first contact with promising developers. I am wondering if Oracle and the team of Wim Coekaerts are willing to jump into this challenge.

Thursday, September 20, 2012

Mobile systems in Africa

When we talk about mobile systems we automatically think about smartphones who do connect via the internet to services to bring services to the customer. We do see an explosion of apps and more and more companies are starting mobile channels for sales and services. Specially in Europe and the US we see the trend picking up and now Asia (Japan was already heavy involved in mobile). However what we sometimes forget is that Africa is a large and upcoming market.

For example, Africa is the number one continent for mobile banking where a mobile banking revolution is taking place.

Millions of Africans are using mobile phones to pay bills, move cash and buy basic everyday items. So why has a form of banking that has proved a dead duck in the West been such a hit across the continent? It has been estimated that there are a billion people around the world who lack a bank account but own a mobile. Africa has the fastest-growing mobile phone market in the world and most of the operators are local firms.

One of the things however is to consider is that in Africa only arround 1% of the mobile phones is a smartphone. Due to this the app revolution is not picking up (yet). However, Africans are also very eager to work in a mobile way. Francis Pisani is explaing some of the mobile innovations currently happening in this Capgemini video.

Wednesday, September 19, 2012

Share healthcare data

One of the most important things to most people is their health. The western world is spending fortunes on medical care and on inventing new medicines and treatments every day. We do however see that one area where a lot of good can be done is still lacking behind. This is the way we handle data in the healthcare systems. Several countries have tried to implement systems to handle and share patient data in a national system. Most of them have failed. The reason for failing are several. Security and privacy are some that come directly to mind.

When you are handling personal data, and maybe even specially when handling medical personal data, is that we have to extremely careful that the data is used for the correct purpose and is not falling in the wrong hands. However having the option to quickly share data about patients can safe lives and can make the medical costs for handling patients go down.

Currently we are spending money on redoing tests, finding lost data during transfer and making the wrong decissions on a daily basis. Mark Blatt, Worldwide Medical Director at Intel, has some thoughts on this, one of the slides from a presentation given by Mark Blatt shows the collaborative workflows that could be used in the medical industry.

most of us will agree that sharing medical information will help the medical system and will benefit the patient. What most of us do not agree upon is the way how this should be implemented and especially in consideration with patient confidentiality and privacy. Finding the right way of doing this is however, in my opinion one of the big challenges that the IT industry will have to take upon itself together with the medical industry and governments. We all do recognize that the medical costs are skyrocketing and that, if we want to be able to make use of the medical care we against a price we can pay we have to find ways to make this less expensive. In my opinion sharing data can be a big step into this process of cutting down costs.

Below is a part of a presentation given by Mark Blatt;

Tuesday, September 18, 2012

Linux print file reversed

Within Linux you have several commands that will read a file and put the output from this file to your screen. Or you can pipe it into an other file if that is what you need. however from time to time you might want to read the file in reversed order. For example if you have an application that is writing information to a file and you have to put this file in reversed order into a input file for another program or to be loaded into a database.

as with most things in Linux there are several ways to dot his. you could read the number of lines by using wc -l and use this number in loop of a bash program to print the file line by line in reversed order.

however, as stated, there are more ways to do this. This most easy way to read a file in reversed order in Linux is making use of the tac command. The tac command will do exactly as described above without any line of coding or any other smart trick. It just reads a file and print the content in reversed order to your screen (or pipe it into another file).

NAME
       tac - concatenate and print files in reverse

SYNOPSIS
       tac [OPTION]... [FILE]...

DESCRIPTION
       Write each FILE to standard output, last line first.  With no FILE, or when FILE is -, read standard input.

       Mandatory arguments to long options are mandatory for short options too.

       -b, --before
              attach the separator before instead of after

       -r, --regex
              interpret the separator as a regular expression

       -s, --separator=STRING
              use STRING as the separator instead of newline

       --help display this help and exit

       --version
              output version information and exit

Monday, September 17, 2012

Linux whatis command

Linux and the Linux commands have a long history. Some of the commands currently used within the mainstream Linux originate from the first UNIX (like) systems. Also the nature of Linux being opensource adds to the list of commands and utilities that are added to the Linux distributions. This makes it sometimes hard to keep track of commands that you do not use every day. The most common commands you use for your day to day work you probably know by hart however every now and then you would like to know an answer the question "what is this command". For this very purpose the whatis command is added to most of the Linux distributions. This is exactly doing what the command indicates by its name.

The average man page has the following parts associated to it:

Name
Synopsis
Description
Author
See also

When you look into the man page all this information is send to your screen while you simply want to know what this command is. For example if you query the man page of the ls command you will get something like this (and more)

NAME
       ls - list directory contents

SYNOPSIS
       ls [OPTION]... [FILE]...

DESCRIPTION
       List information about the FILEs (the current directory by default).  Sort entries alphabetically if none of -cftuvSUX nor --sort is specified.

       Mandatory arguments to long options are mandatory for short options too.

       -a, --all
              do not ignore entries starting with .

       -A, --almost-all
              do not list implied . and ..

       --author
              with -l, print the author of each file

       -b, --escape
              print C-style escapes for nongraphic characters
.........................
.........................
.........................

In some cases this is way to much information and you would be happy with only the part that is in the name section because this is giving you more than enought information. If you do a whatis ls command you will get the following:

johan@linux-main:~$ whatis ls
ls (1)               - list directory contents

Sunday, September 16, 2012

Understanding Parallel Computing

Some time ago I posted a blogpost named "The future of computing is parallelism". Here I was discussing the future of computing and the role of parallelism that will be a large player in future computing and system design. Below you can find 2 video's on Understanding Parallel Computing and the Amdahl’s Law.

Amdahl's law, also known as Amdahl's argument, is named after computer architect Gene Amdahl, and is used to find the maximum expected improvement to an overall system when only part of the system is improved. It is often used in parallel computing to predict the theoretical maximum speedup using multiple processors. It was presented at the AFIPS Spring Joint Computer Conference in 1967.

The speedup of a program using multiple processors in parallel computing is limited by the time needed for the sequential fraction of the program. For example, if a program needs 20 hours using a single processor core, and a particular portion of 1 hour cannot be parallelized, while the remaining promising portion of 19 hours (95%) can be parallelized, then regardless of how many processors we devote to a parallelized execution of this program, the minimum execution time cannot be less than that critical 1 hour. Hence the speedup is limited up to 20×, as the diagram illustrates.

source: wikipedia

Deploy Oracle RAC on Oracle VM

Oracle VM is the XEN implementation from Oracle which can be used as a hypervisor for virtualization of operating systems. Oracle VM could (should) be your standard choice when deploying virtualized environments that will be used to deploy Oracle products. Reason for this is that Oracle is only recognizing the hard partitioning of Oracle VM as a valid option with respect to their license policy.

Oracle is providing product certification for a lot of their products to make use of Oracle VM and the list is growing. On the list of certified products you can also find the Oracle RAC setup for production and none-production use. Oracle is even providing templates which ship Oracle Linux and a Oracle RAC implementation for quick deployment.

There are however some quite (un)documented things you have to consider when you deploy a RAC implementation on Oracle VM. A good read is the "Oracle Real Application Clusters in Oracle VM Environments" whitepaper from Oracle which is released in March 2012.

Within this whitepaper the following is stated:
"Specifically, for mission-critical, production deployments it is unsupported to co- locate Oracle VM guests hosting instances of the same Oracle RAC database on a single OVS physical server as a result of Oracle VM guest failover or automated DRS placement policy. Any Oracle VM guest failover or DRS placement policies must respect this fundamental RAC instance placement rules."

This means that, based upon this statement, if you wanted to deploy a 10 node production RAC implementation you cannot consolidate this on (lets say) 5 physical servers. You would need at least 10 physical servers and potentially 11. The 11 is if you wanted to be able to do a life migration to another physical server using Oracle VM and still be within the supported options. If you would only have 10 and you would do a life migration you would have to migrate it to one of the other 9 physical servers and at that moment you would no longer be in a supported setup.

Thursday, September 13, 2012

Get Linux file system blocksize

In a previous post I discussed the topic of blocksize for an Oracle database. As you have seen in this post it can be important to understand which blocksize to select when creating your database. When talking about blocksize it is also good to know what the blocksize is on your files system. Your database and the way it is requesting blocks from disk is related to the blocksize on your file system mounted to your operating system.

If you want to know the blocksize of your file system you can check this in multiple ways. One of the most commonly used ways is to execute the below command:

dumpe2fs /dev/sda1 | grep -i 'Block size'

this calls dumpe2fs, prints the super block and blocks group information for the filesystem present on device. in the above example we gather the information of /dev/sda1

As you can see we do a grep on 'Block size'. If you do not grep this you will get a LOT of information. the most usable for most users is the first part. You can simply use a more to get this information in the most easy way:

dumpe2fs | more

as an example you can see the output of this below which is the information of one of my workstations:

dumpe2fs 1.42 (29-Nov-2011)
Filesystem volume name:   
Last mounted on:          /
Filesystem UUID:          584232d2-3bc8-41bb-9534-c3a4f4f6e64c
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
Filesystem flags:         signed_directory_hash 
Default mount options:    (none)
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              91054080
Block count:              364187648
Reserved block count:     18209382
Free blocks:              337917734
Free inodes:              90742705
First block:              0
Block size:               4096
Fragment size:            4096
Reserved GDT blocks:      937
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         8192
Inode blocks per group:   512
Flex block group size:    16
Filesystem created:       Sat Dec 31 17:42:47 2011
Last mount time:          Sun Sep  9 20:37:55 2012
Last write time:          Wed Apr 11 17:52:11 2012
Mount count:              11
Maximum mount count:      21
Last checked:             Wed Apr 11 17:52:11 2012
Check interval:           15552000 (6 months)
Next check after:         Mon Oct  8 17:52:11 2012
Lifetime writes:          225 GB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:           256
Required extra isize:     28
Desired extra isize:      28
Journal inode:            8
First orphan inode:       70780693
Default directory hash:   half_md4
Directory Hash Seed:      c55ef959-1938-4934-b178-eb9409c00d0b
Journal backup:           inode blocks
Journal features:         journal_incompat_revoke
Journal size:             128M
Journal length:           32768
Journal sequence:         0x00102c0c
Journal start:            30520

Sunday, September 09, 2012

Oracle database blocksize considerations

When creating a Oracle database (or new database file) most people quickly move over the question concerning the database blocksize. In most cases the standard for an Oracle database blocksize is selected, which is 8KB (8192 byte) or 4KB (4096 byte). In general this might be a good choice however it can be good to consider / reconsider your blocksize when you have a high demanding environment where database performance is of essence.

you have to make some considerations and you have to know the advantages and disadvantages of larger or smaller blocksizes for your Oracle database. from The Oracle 11.2 Database Performance Tuning Guide we can learn the following:

Smaller blocksize.
Advantages;

Good for small rows with lots of random access.
Reduces block contention.

Disadvantages;

Has relatively large space overhead due to metadata (that is, block header).
Not recommended for large rows. There might only be a few rows stored for each block, or worse, row chaining if a single row does not fit into a block.

Larger blocksize.
Advantages;

Has lower overhead, so there is more room to store data.
Permits reading several rows into the buffer cache with a single I/O (depending on row size and block size).
Good for sequential access or very large rows (such as LOB data).

Disadvantages;

Wastes space in the buffer cache, if you are doing random access to small rows and have a large block size. For example, with an 8 KB block size and 50 byte row size, you waste 7,950 bytes in the buffer cache when doing random access.
Not good for index blocks used in an OLTP environment, because they increase block contention on the index leaf blocks.

some guidelines are provided by Oracle when you have to make this consideration based upon the the fact if you have many read operations or if you have many write operations. Remember that you can state your blocksize per datafile so when you are desiging a architecture for your database you might consider to have write operations in a different datafile than you have the most of your read operations. You will have to make some considerations here and this might have effect on your application design however it is worth looking at your options.

Read operations;
Regardless of the size of the data, the goal is to minimize the number of reads required to retrieve the desired data.

If the rows are small and access is predominantly random, then choose a smaller block size.
if the rows are small and access is predominantly sequential, then choose a larger block size.
If the rows are small and access is both random and sequential, then it might be effective to choose a larger block size.
If the rows are large, such as rows containing large object (LOB) data, then choose a larger block size.

Write operations;

For high-concurrency OLTP systems, consider appropriate values for INITRANS, MAXTRANS, and FREELISTS when using a larger block size. These parameters affect the degree of update concurrency allowed within a block. However, you do not need to specify the value for FREELISTS when using automatic segment-space management.
If you are uncertain about which block size to choose, then try a database block size of 8 KB for most systems that process a large number of transactions. This represents a good compromise and is usually effective. Only systems processing LOB data need more than 8 KB.

A larger data block size provides greater efficiency in disk and memory I/O (access and storage of data). Therefore, consider specifying a block size larger than your operating system block size if the following conditions exist:

Oracle Database is on a large computer system with a large amount of memory and fast disk drives. For example, databases controlled by mainframe computers with vast hardware resources typically use a data block size of 4K or greater.
The operating system that runs Oracle Database uses a small operating system block size. For example, if the operating system block size is 1K and the default data block size matches this, the database may be performing an excessive amount of disk I/O during normal operation. For best performance in this case, a database block should consist of multiple operating system blocks.

You can simply check your current blocksize per datafile by executing the below SQL query:

 SELECT 
       name, 
       bytes, 
       blocks, 
       block_size 
 FROM 
     v$datafile

Sunday, September 02, 2012

Developing big-data triggers

Most of you will know CERN from the LHC, Large Hadron Collider, experiment used to discover the Higgs Boson particle. This is one of the most interesting experiments within physics at this moment and the search for the Higgs Boson particle comes into the news quite often. What a lot of people however do not realize is that this research is somewhat different from traditional research in the field of physics as it comes to the amount of data.

When Isaac Newton “discovered” gravity it only took him a tree to lean against and a apple to fall down. Those are not a lot of input streams of information. When it comes to finding the Higgs Boson particle we are playing in a total different field when it comes to the number of inputs. During an event the data capture system will store every second a dataflow the size of rougly six times the Encyclopædia Britannica.

The main issue is that the systems will not be able to handle and store all the data presented to the sensors. All sensors will have triggers developed to capture the most important data. As we are talking about find a particle that is never discovered before the triggers might discard the Higgs Boson particle data instead of storing it for analysis. Developing the triggers is one of the crucial parts of the experiment and one of the most critical parts. In the below video Tulika Bose a assistant professor from the University of Boston gives a short introduction to this.

Within CERN The TriDAS Project is responsible for developing the data Acquisition and High-Level Trigger systems. Those systems will select the data and store it and finally result in data that can be analyzed. For this a large group of scientists and top people from a large number of IT companies have been working together to build this. IT companies like Oracle and Intel have been providing CERN with people and equipment mainly so they can test their new systems in one of the most demanding and data intensive setups currently operational.

Below you can see a high a high level architecture of the CMS DAQ system. This image comes from the "Technical Design Report, Volume 2" delivered by the TriDAS Project project team.

In a somewhat more detailed view the system looks like the architecture below from ALICE project. This shows you the connection to the databases and other parts of the systems.

While finding the Higgs Boson particle is for the common public possibly not that interesting on the short term having IT companies working together with CERN is even though it might not be that obvious at first. CERN is handling a enormous load of data. IT companies who participate in this project are building new hardware, software and algorithms that are specific to finding the Higgs Boson particle. However, the developed technology will be used within building solutions that will end up in serving customers.

As big-data is getting more and more attention and as we can see all kinds of big-data based solutions are developed we can see that this is no longer a pure scientific play field. It is getting into the day-to-day lives of people. This will help people in the very near future in their day-to-day lives. So, next time you question what the search for the Higgs Boson particle is bringing you as a individual on the short term, take the big-data part into your consideration and do not think it is only interesting to scientists(which is a incorrect statement already however a topic I will not cover on this blog :-) )

Oracle Logistic chain

The overall logistic chain used within companies around the world is for 90% the same for all companies. Some specific customer variances are there and the way it is implemented might be different however overall all they all look quite the same. You purchase goods, you do something and you sell it to customers. Purchasing goods can be purchasing parts where you build your own product, purchasing goods to mine your final product or purchase, store and sell. All in all your company will buy things. Selling can be to external customers or external customers (cost centers) however somewhere your company will make profit (I hope).

Meaning all in all the model will look quite the same and you will see this model reflected in your ERP system. having Oracle eBS or SAP or another ERP system somehow it will be having a standard logistic chain in the system. For each ERP product will have some different naming for the modules. For Oracle Oracle eBS the logistics chain will look somewhat like the one below.

This model quickly shows a introduction on where to place the Oracle e-Business suite modules in the logistics chain. You can see for example Oracle AP, Oracle AR, inventory, General Ledger, etc etc etc