Monday, December 31, 2012

Social media analysis

IBM is running a project called Smarter Planet in which they combine a lot of the new and upcoming technologies and ways of thinking and put them to use to craft a view of how to build a better world and a smarter planet. As the tech company IBM is they do touch the subject of social media and big data. In the minds of a lot of companies social media and the capability of harvesting information from social media will be a valuable asset for the future. A future where big data is analyses on the fly and where we can understand what humans are saying and meaning, where we can interpret the feelings and emotions and have technology interact on this.

Secondly we will have a more instrumented, interconnected and intelligent way of doing things and analyze things where we can combine human expressions like a tweet or blogpost in combination with traffic and weather. All this combined will give is a good view of the state of a person or a group of persons.

One of the things that will play a big role in this big data future is social media analysis, where are we in the social media world, what are our connections, who are our opinion providers and who are our opinion consumers. Having the option to harvest and give meaning to a enormous amount of data will open up a whole new way of doing traditional business and will provide new business opportunities. New companies and ways of doing commerce will see the light of day.

In the above presentation, Graham Mackintosh, IBM social media analytics expert explains why social media analytics is so important and how it can be used to derive insights from the growing level of data that we generate through social media.

Sunday, December 30, 2012

Human emotion algorithm for social media

A couple of post ago I toughed the subject of big data sentiment analysis and how the field of sentiment analysis will be the next frontier in data analysis. Sentiment analysis will help developers build applications and analysis tools to improve the machine to human interaction and make technology play a more integrated and more natural way of helping humanity.

Feelings and computer algorithms are a field in which we do not have that much experience at this moment and a lot of people do feel that we will not be able to concur all the fields of emotions. Some people feel that we should not try to concur all the fields of computerized emotions and digitized emotions. In the below video Intel futurist Brian David Johnson talks with Dr. Hava Tirosh-Samuelson, Director of Jewish Studies at ASU. Dr. Hava talks about transhumanism, our future and the fact that to her opinion not all emotions can be captured or should be captured.

According to some it will be possible to capture the human emotion and extract the human emotion from a sentence or a work of art however it will be impossible to give a machine emotions. Giving a machine emotion will most likely be one of the most hardest parts of computer science in the upcoming time however first we will have to be able to capture and extract human emotions. This will provide us a much better way to interact with humans from a machine point of view. Some projects are working on this, some are working on collecting emotions and some try to make meaning out of the harvested emotions. One of the graphically most appealing emotion harvesting projects is done by Jonathan Harris an Sep Kamvar at project scans the internet continualsiy to find feelings based upon blogposts containing a sentence "I feel". On the project page you can see how the world is feeling at this moment.

 Below you can see a talk done by Jonathan Harris at

As already stated capturing human emotions from social media en machine interaction and giving meaning to it will be one of the next big things a lot of companies will try to resolve. Having a algorithm to understand human emotions can be implemented in a lot of applications and will provide a whole new era of how we work with and interact with technology. The challenge in this is that this cannot be done by computer researchers alone, a lot of people and a lot of fields of science will have to interact to only begin to imagine the complexity of crafting such a algorithm.

Wednesday, December 19, 2012

Hadoop datacenter blueprint for HDFS

HDFS (Hadoop Distributed File System) is the core for many big-data solutions that depend on an implementation of Hadoop. In many cases a Hadoop implementation consists out of a large number of nodes working together to provide the functionality that you need. When we are talking about massive multi-node implementations there are some things to consider that might not come to mind directly. One of those things for example is that the speed your Hadoop implementation is providing you already starts with a dependency on how you have arranged your network and how you have arranged your datacenter.

When you are planning a rather small implementation of HDFS you will most likely be fine with a simple layout for you namenodes as shown below. All your namenodes are connected to a single network segment and are most likley all within the same rack cabinet.

 As stated for a simple setup with a relative low number of datanodes a setup as shown above will not cause any issues. The issue comes when you do have a larger cluster which spans over multiple racks all containing multiple nodes, for example a 100+ namenode cluster.

When you define your HDFS you will have to set the number of replications for your blocks. This means that every block will be replicated X times on different nodes. This setting is done in your hdfs-site.xml configuration file by adjusting the variable dfs.replication .When you update data which resides in a certain block this block will be replicated to all other nodes where this particular block resides. This means that you will have a lot of communication of data between your datanodes that is only used to keep your cluster in sync.

Within a simple setup this is not a real issue however when spanning over 100+ nodes in multiple racks your data will have to travel a lot and will take up networking resources that due to this cannot be used for other things.

In the image below you can see a more complex landscape where we have 15 datanodes postitioned over 3 racks. All racks have a top-rack switch to connect all the HDFS datanodes to the rack network backbone, all racks are connected to the cluster backbone for communication between the nodes of each rack.

If we do not take any action we can come into the situation that we will have a lot of communication between datanodes in different racks and by having this we will see a lot of communication over the cluster backbone and multiple rack backbones. This might cause additional network latency and potential bottlenecks especially at your cluster backbone switch. For example it would be quite possible that datanode 2 holds data that needs to be replicated to datanode 7 and datanode 10 which are both in a different rack.

To prevent this from happening you can make HDFS datanodes aware of there physical location in the datacenter and state that replication will only is allowed between datanodes within the same rack.  The scripting will allow you to state the rack ID and will help you limit the traffic between racks. This will make sure you will get a cluster model as shown below where you have a root (cluster) and 2 racks both containing 3 datanodes

When planning your rack setup both physical and within your cluster configuration there are some things to consider, if you allow replication only within a single rack and this rack is a physical rack at the same time as a software configured rack you can run into troubles if this rack fails for example due to a powerloss on your rack. You will not have access to any of the data that is stored within this rack as all the replications are in the same rack.

When planning you do have to consider this and think about the possibilities to have virtual racks where for example all the odd numbered servers in physical rack A and B are configured to a single (virtual) rack and all the even numbers in physical rack A and B into another virtual rack. To cope with possible loss of a switch and to provide quick communication between the racks you might want to consider connecting some switches with fiber to assure quick connection between physical racks.

Tuesday, December 18, 2012

Oracle database and Cisco Firewall considerations

Oracle databases are very commonly used in secured environments and are in many cases considered as a vital part of the IT infrastructure of a company. Oracle databases are often used to provide database solutions to core financial and operational systems. Due to this reason the security of your database and the entire infrastructure design needs to be in the minds of all involved. One of the things however commonly overlooked by application administrators / database administrators and developers is how the infrastructure is done. In many companies the infrastructure is done by another department then the people responsible for the applications and the database.

While it is good that everyone within an enterprise has a role and is dedicated to this it is also a big disadvantage in some cases. The below image is a representation of how many application administrators and DBA's will draw there application landscape (for a simple single database / single application server design).

In essence this is correct however in some cases to much simplified. A network engineer will draw a complete different picture, a UNIX engineer will draw another picture and a storage engineer another. Meaning you will have a couple of images. One thing however where a lot of disciplines will have to work together is security, the below picture is showing the same setup as in the image above however now with some vital parts added to it.

The important part that is added in the above picture is the fact that is shows the different network components like (V)LAN's and firewalls. In this image we excluded routers, switches and such. The reason that this is important is that not understanding the location of firewalls can seriously harm the performance of your application. For many, not network type of people, a firewall is just something you will pass your data through and it should not hinder you in any way. Firewall companies try to make this true however in some cases they do not succeed.

Even though having a firewall between your application server and database server is a good idea, when not configured correctly it can influence the speed of your application dramatically. This is specially the case when you have large sets of data returned by your database.

Reason for this is how a firewall appliance, in our example a Cisco PIX/ASA is handling the SQL traffic and how some default global inspection is implemented in firewalls which are based and do take note of an old SQL*Net implementation of Oracle. By default a Cisco PIX/ASA has implemented some default inspect rules which will do deep packet inspection of all communication that will flow via the firewall. As Oracle by default usage port 1521 the firewall by default has implemented this for all traffic going via port 1521 between the database and a client/applicatio-server.

Reason this is implemented dates back to the SQL*Net V1.x days and is still implemented in may firewalls due to the fact that they want to be compatible with old versions of SQL*Net. Within SQL*Net V1.x you basically had 2 sessions with the database server.

1) Control sessions
2) Data sessions

You would connect with your client to the database n port 1521 and this is referred to as the "control session". As soon as your connection was established a new process would spawn on your server who would handle the requests coming in. This new process received a new port to specifically communicate with you for this particular session. The issue with this was that the firewall in the middle would prevent the new process of communicating with you as this port was not open. For this reason the inspect rules on the firewall would read all the data on the control session and would look for a redirect message telling your client to communicate with the new process on the new port on the database server. As soon as this message was found by the firewall the firewall would open a temporarily port for communication between the database and your client. After that moment all data would flow via the data session.

As you can see in the above image the flow is as follows:
A) Create sessions initiated by the client via the control session

B) Database server creates new process and sends a redirect message for the data session port
C) All data communication go's via the newly opened port on the firewall in the data session

This model works quite OK for a SQL*Net V1.x implementation however in SQL*Net 2.x Oracle has abandoned the implementation of a control session and a data session and has introduced a "direct hand-off" implementation. This means that no new port will be opened and the data sessions and the control session are one. Due to this all the data is flowing via the the control session where your firewall is listing for a redirect that will never come.

The design of the firewall is that all traffic on the control session will be received, placed in a memory buffer, processed by the inspect routine and then forwarded to the client. As the control session is not data intensive the memory buffer is very small and the processors handling the inspect are not very powerful.

As from SQL*Net V2.x all the data is send over the control sessions and not a specific data session this means that all your data, all your result sets and everything your database can return will be received by your firewall and will be placed in the memory buffer to be inspected. The buffer in general is something around 2K meaning it cannot hold much data and your communication will be slowed down by your firewall. In case you are using SQL*Net V2.x this is completely useless and there is no reason for having this inspect in place in your firewall. Best practice is to disable the inspect in your firewall as it has no use with the newest implementations of SQL*Net and will only slow things down.

You can check on your firewall what is currently turned on and what not as shown in the example below:

class-map inspection_default
 match default-inspection-traffic
policy-map type inspect dns preset_dns_map
  message-length maximum 512
policy-map global_policy
 class inspection_default
  inspect dns preset_dns_map 
  inspect ftp 
  inspect h323 h225 
  inspect h323 ras 
  inspect rsh 
  inspect rtsp 
  inspect esmtp 
  inspect sqlnet 
  inspect skinny 
  inspect sunrpc 
  inspect xdmcp 
  inspect sip 
  inspect netbios 
  inspect tftp 
service-policy global_policy global

Sunday, December 16, 2012

Big data Sentiment analysis

A lot of people who are currently working on, or thinking about, big data do directly start talking about how they can implement a certain things. How much data you will have to store and how much data you will have to process to find meaning in big data. A lot of people start talking about how they will need to store things, how they will need to build implementations of Hadoop and the Hadoop Distributed File System to be able to cope with the load of data.

I recently had a discussion with a couple of people and we where thinking about the fact if we could dynamically route taxis in a city based upon big-data and social media feeds you could receive from the internet. The second thing we where discussing was if we could measure the satisfaction of the user by linking his tweets back to the trip he just made in the taxi.

It is true that for receiving and processing the data coming in which needs processing and finally will result in routing taxi's will be enormous. Secondly it is true that you will need a lot of computing power and yes you will most likely need Hadoop like solutions for processing this. However there is a much more subtle part to this what a lot of people are overlooking.

Finding the need for routing one or more taxi's to a part of the city where we expect to be able to find a lot of customers based upon social media and historical data is challenging however there is an ever bigger and even more challenging part and that is in they end. That is the part we brought in while discussing if we could measure if the customer was happy by the things he published on social media just after he made use of the taxi.

Lets say we know a person called Tim entered a taxi and he is having the twitter account @TimTheTaxiCustomer. He is picked up and is dropped of during rushhour a couple of blocks away from his original location. We are monitoring the twitter feed from Tim and now he is sending a tweet with the following text:

"made it to my meeting, wonder what would happen if I had taken the subway"

He also had possibly could send out a tweet with;

"Made it to my meeting, never had this when is used the subway"

Looking at those text for a human it is, with some luck, possible to find out what Tim is meaning and if he is happy with the service. You will have to have some skills however we can conclude that with the first tweet he is somewhat jokingly referring to the subway and insinuating that he would have never made it with the subway. Meaning good service from the taxi company. The second tweet is however indicating that he had never experienced this with the subway which could mean that he had to run to make it to his meeting. 

This process is even for humans complex from time to time and is very driven by local slang, local ways of saying things and culture influences. Having a human however read all the messages of all our potential taxi customers is not feasible due to the amount of messages that need to checked and the speed they are produced by. To be able to cope with the load we will need an algorithm to check the messages and to enable the algorithm to keep up with the pace of messages generate we will have to run this on a cluster which makes use of massive parallel computing.

Creating a algorithm capable of understanding human emotions is the field of sentiment alnalysis.
"Sentiment analysis or opinion mining refers to the application of natural language processing, computational linguistics, and text analytics to identify and extract subjective information in source materials.Generally speaking, sentiment analysis aims to determine the attitude of a speaker or a writer with respect to some topic or the overall contextual polarity of a document. The attitude may be his or her judgment or evaluation (see appraisal theory), affective state (that is to say, the emotional state of the author when writing), or the intended emotional communication (that is to say, the emotional effect the author wishes to have on the reader)."

In the below video Michael Karasick is toutching this subject in quite a good way in his IBM research lab video;

IBM Research - Almaden Vice President and Director Michael Karasick presents a brief overview of "Managing Uncertain Data at Scale." Part of the 2012 IBM Research Global Technology Outlook, this topic focuses on the Volume, Velocity, Variety and Veracity of data, and how to harness and draw insights from it.

Monday, December 10, 2012

Using metasploit for an attack on Oracle

On a "extreme" frequent basis I do get the question, via all different kind of channels, from people on how they can hack into an Oracle database / Oracle application server / Oracle X product. I did post in the past some things about certain security options in Oracle which where not always as good as you would like them to be. I do however try to stay away from giving people advise on how to hack into a certain system. If you are determined enough you will find a way in every system. So please do not expect me to give you a copy/past answer on this question.

Having this stated, I do provide something some advise on things to do or not to do when securing your environment. Some of the things I do encounter every now and then is quite a security thread. This is related to half finished installations and/or installations of Oracle products that have not been cleaned from default screens and logins.

When installing an Oracle product as for exampe an application server or database you commonly do get some default (http) pages and screens which will enable you during the initial installation and deployment of code and will guide you to the correct login pages. This is all fine as long as you are aware of the fact that those pages do exists and take the appropriate counter measures for this. The screenshot below is one of the default screens used by Oracle the Oracle application server.

Having such a page is one of the first things that can attract a potential hacker to your server. If, for example, you have connected your application server to the internet this page can be indexed by a search engine like Google. An attacker looking for a certain version of an Oracle product can use Google quite easily to find possible targets.

For those of you who do have experience with security and specially security of web attached systems and the ways a complete system can be compromised by making use of the HTTP attack angle this all will make sense. And even if you do not know all the default pages that might be populated by the newly installed system a good administrator will look for them and remove them where appropriate.

For you who are not that familiar with this, the below presentation is giving you some insight in this. This presentation was given by Chris Gates during the 2011 blackhat convention where he showed some of the ways to use Metasploit to attack an Oracle based system.

BlackHat DC 2011 Gates Attacking-Oracle-Web-Slides

In 2009, Metasploit released a suite of auxiliary modules targeting oracle databases and attacking them via the TNS listener. This year lets beat up on...errr security test Oracle but do it over HTTP/HTTPS. Rather than relying on developers to write bad code lets see what we can do with default content and various unpatched Oracle middleware servers that you’ll commonly run into on penetration tests. We’ll also re-implement the TNS attack against the isqlplus web portal with Metasploit auxiliary modules.

Tuesday, December 04, 2012

Use oracle database trigger for audit

In a recent forum post at Oracle a user asked the following question: "I have audit trail for the table *****_INSTALL_BASE_ALL for columns attribute10, attribute21, attribute22 if someone changes these columns I want to get the alert for the same". I answerd this question that this could be done with a database trigger and that you can write code to send you a mail or any other way of sending an alert.

Giving this answer is the quick response however it triggered me as I have received a couple of questions that all involve logging of some sort when a value in a table changes. Due to some reason the usage of triggers in the database and what you can do with them is not widespread knowledge. Triggers can be used for all kind of reasons, one of the most important reasons I am personally in favor of them is that you can add logic to a insert or update without having to change anything to the code that is initiating this insert or update.

If you have, for example, a packaged application that is allowed to update some values in a table and you want to add a log to it however you do not want to change the application and add customizations in the code of the application itself you can easily achieve what you want by adding a trigger to the table.

In the below example we have a application which is used to create contracts for customers. Every contract type has a certain profit margin associated to it. Every newly created contract will look into this table and based upon the contract type it will select the right profit margin that needs to be applied. Changing this value can have a huge impact on the financial results of the company. Due to this reason you might want to have some extra logging and auditing in place to be able to track back who changed something at which time.

In this example we have a table named CONTRACT_PROFIT;

desc contract_profit
Name                 Null Type         
-------------------- ---- ------------ 
CONTRACT_TYPE             VARCHAR2(20) 

For our audit process we have a table called CONTRACT_AUDIT where we will store the log information;

desc contract_audit
Name      Null     Type           
--------- -------- -------------- 

What we want to achieve is that if someone changes a value in the table CONTRACT_PROFIT we will have a log entry in the table CONTRACT_AUDIT so we have a trail on what changed when. The layout of the table is very simple and you can make this as sophisticated as you like however for this example we have made this as easy as possible.

At the start of our example the table CONTRACT_PROFIT is filled with the following data;

-------------------- --------------------
DIRECT_SALES         20                   
INDIRECT_SALES       15                   
DIRECT_LEASE         19                   
INDIRECT_LEASE       14        

We can now define a trigger on the table CONTRACT_PROFIT to write some logging to a table. In the below example you can see how we defined the trigger. This trigger will fire upon a update statement.

CREATE OR REPLACE TRIGGER contract_profit_after_update AFTER
  UPDATE ON contract_profit FOR EACH ROW DECLARE v_username VARCHAR2(10);
    SELECT USER INTO v_username FROM dual;
    INTO contract_audit
        'Last change made on: '
        ||' . Change made by user: '
        ||' . New value for '
        ||' is now '
        ||' .'

If we now update the table CONTRACT_PROFIT and change the value of CONTRACT_PROFIT_RATE for DIRECT_SALES from 20 to 21 the trigger will be fired and you will see a new line in the table CONTRACT_AUDIT. An example of this line is shown below;

Last change made on: 04-DEC-12 AM US/PACIFIC . Change made by user: LOUWERSJ .New value for DIRECT_SALES is now 21 .

You might want to add other information to a log file, you might even do not want to create a insert statement however rather have a mail action defined. This can all be done when developing your trigger.


Monday, December 03, 2012

Find developers for your startup

When you are working on a new product or when you are building a startup tech company it is hard to find the right people to join your startup. In many cases a department who working on a new concept or a group of people who are working on a idea and trying to build a startup company around it are tight on budget. You do want to hire directly the right people with the right skills and do not have the time to make a mistake in your hiring process.

When you are a startup you will not have a HR department nor do you have a big funding to get a headhunter to find the correct person for your. Due to this you see a lot of startups filled with people who know someone already working for you. This however can limit you a bit because you do depend on the social circle of your employees.

The people behind "Work for Pie" have build a solution for this. They have had a lot of experience in building a social network for opensource software developers to find the correct developer for a opensource project. Recently they have launched Work For Pie for Companies. On this platform companies can interact with developers to find the correct person. You can see it a bit as a mix between linkedin, a dating site and a job site.

Robert Scoble interviewd the people behind "Work for Pie" for his show rackspace show where he is interviewing people behind the new tech startup companies. helps you build a showcase of your work, which takes less than 10 minutes. This showcase is much better than an old-style resume. Here the founders explain why in our group of startup interviews at Techcrunch Disrupt.

Disk IO operations for in-memory databases

When working with in-memory databases a lot of people discard the fact that your disk IO can still be a bottleneck. It is true that most of your operations are done in memory when working with a in-memory database. However, having slow disk I/O can dramaticly impact the performance of you operations even when using a in-memory database. You have to make some decissions in your hardware and in your application design when working with a in-memory database.

The Oracle TimesTen in memory database is a good example when talking about this subject.
Oracle TimesTen In-Memory Database (TimesTen) is a full-featured, memory-optimized, relational database with persistence and recoverability. It provides applications with the instant responsiveness and very high throughput required by database-intensive applications. Deployed in the application tier, TimesTen operates on databases that fit entirely in physical memory (RAM). Applications access the TimesTen database using standard SQL interfaces. For customers with existing application data residing on the Oracle Database, TimesTen is deployed as an in-memory cache database with automatic data synchronization between TimesTen and the Oracle Database.

As a quick introduction to the TimesTen database Oracle has provided a short, 3 minute youtube video showing what Oracle TimesTen can do;

One of the interesting topics already touched in the above video is, what will happen to my in-memory data when the server fails. The answer to this is that TimeTen will make it persistent. Making it persistent means in this case writing it to a disk. Writing data to disk means disk I/O. This means that a in-memory database is not 100% pure in-memory, it will still use your disks.

Oracle TimesTen uses durable commits to make a commit persistent. When using a durable commit the commit is not only written to the database structure which is hold in memory it is also written to disk. It is good to know that by default every commit results in a durable commit and by doing so results in disk I/O. Developers can make the decision if they will do a normal commit (durable) or a non-durable commit.

As an example, if you are building a webshop application based upon a TimesTen database you can make the decission that every item that is placed in the basket by a user is written to memory by using a non-durable commit. When the order is placed the order itself is written with a durable commit. This holds that if the database is to crash all items that a user has placed in a basket will be lost and will have to placed again, however a order that is placed is stored safely to disk and when the database is loaded into memory again during start-up this is still present.

More detailed information about durable and non-durable commits in a Oracle TimesTen database can be found in the documentation. Specially the chapter "use durable commits appropriately" is advisable to developers who are new to in-memory database architectures. 

Making non-durable commits are in general faster than durable commits. This can improve the performance of your application dramatically. However at some point you will most likely want to use a durable commit to safeguard the final transaction. In case you have a large load on your system and you have slow disk I/O this can hinder the performance of your application. In the below video from Intel you can see a example of this and how solid state disks help to improve this in the showcase shown in the video.

See Diane M. Bryant, Vice President & General Manager of the Datacenter and Connected Systems Group (DCSG) at Intel, walk through a compelling Big Data server demonstration featuring Intel® Xeon® processor-based servers and Intel® Solid-State drives running Oracle TimesTen® In-Memory Database. Chuck DuVall, Sr. Technical Engineer explains the demo details which highlight the need for solid state drive speed for persistent data while running In-memory databases.

Sunday, December 02, 2012

SoMoClo outlook for 2013

One of the futures of computing and the futures of cloud computing is the the hookup between mobile technology, social media and cloud computing. The three concepts are complementary to each other from a technology point of view and  combining social media concepts with a mobile (and location aware) platform and bring the storage and computing of data streams to the cloud to enable a always on computing backbone delivers a quite interesting new concept. A concept which enables companies to quickly tap into a B2C market segment and quickly and for low costs build pilot projects and tests to test the SoMoClo concepts.

Every company starting a SoMoClo concept will have to be aware that cloud computing is the basis of the SoMoClo concept. One of the main reasons is the flexibility of cloud computing. You can quickly ramp up and down computing resources and you have the advantages you can quickly adopt services from other cloud enabled companies who do provide API's to their functionality. Every company looking to start a SoMo(Clo) project without giving the CLOud part a good look will make a very big mistake and will most likely have (A) have a unsatisfactory RTO (B) a disappointing group of users and (C) a failed project.

The Aberdeen group has the following statement about this:
The rate of transformation in IT is the fastest it has ever been, and increasing almost daily. The rising number of cores on processors allows the compute capacity of servers to grow dramatically. With faster local and wide area networks, servers can be placed anywhere. The amount of data stored by organizations is doubling every 2 years. At the same time, new mobile computing devices are being deployed to end users, allowing them to work anyplace and anytime, with constant access to new forms of social media. These are exciting times, but where is this disruptive change taking us?Aberdeen’s Telecom and Unified Communications, Wireless and Mobility Communications, and IT Infrastructure research analysts have been watching these trends for several years and has observed a growing convergence of Social, Mobile and Cloud computing. These three computing revolutions are creating a new computing paradigm. The future is a converged computing infrastructure, which Aberdeen has termed “SoMoClo” to underscore the integrated nature of this single overriding trend.


In a recent article by Scott Hebner at he got into how we can make SoMoClo ready for the enterprise. Scott is currently Vice President at IBM for Cloud & Business Infrastructure Management. Inteersting thing from this article is that it go's into a couple of things of making it ready for the enterprise world. On one side it go's into the how you, as an enterprise, should adopt SoMoClo technology to grow you business by using it externally and internally.

It also go's into the fact that your employees are already using it and you can most likely not stop the usage of it by your employees. Even though you might have policies around the use of public and cloud services it is very likely that if you have banned the usage of these services that your employees are using, for example, dropbox a lot. For this it would be good to not ban online and cloud service however to adopt them in such a way that you are able to channel them and make sure they are getting more compliant to internall rules and regulations.

If you ban your staff from using any online service it will most likely not work and your employees will use every service available online. However, if you create a policy that they can use this and this and this service however not THIS specific one due to some good reason you will notice that your employees will most likely accept this without any issue. It will bring issues with it however coping with those issues and problems will bring you more benefit in the end opposed to blocking the total use of it towards your workforce.

Interesting is that this is exactly the same discussion we have been having some time ago on BYOD (Bring your Own Device). In this it was Bruce Schneier, CTO of BT Managed Security Solutions who stated that he was seeing a big security risk in a BYOD policy however would never oppose to it as the benefits where so big to companies and users would do it anyway in they end that it would be better to channel it.

The same go's for the usage of Cloud, mobile cloud and social mobile cloud solutions. Your employees will be using it, are already using it. You can better channel it and make use of it internally an externally. The same go's for building your own SoMoClo, as you will see the adoption of SoMoClo applications is picking up with an every increasing pace your company can get a big benefit of a SoMoClo strategy if understood, adopted and executed in the correct way.

SSH use specific private key

When connecting remotely to a Linux host most of us will (I hope) use SSH to establish a secure connection. Some people will use a single key pair for all the hosts they need to connect to when there is a need for pasword-less authentication to a Linux machine. However, when you are working in different environments and have different keys (and usernames) you can end up with a list of key pairs for all kind of hosts. Meaning you have to instruct the SSH client to use a specific key when connecting to a specific machine you can use the -i options.

For example if you have you private key stored in /user/jolouwer/backup_key for the server you can use the following way to connect to this host. In our case we connect as the user dummy;

ssh -i  /user/jolouwer/backup_key

In this way it will read the specific key located in /user/jolouwer/backup_key and use this to connect tot he server. Now you can use this way to connect to every server you like however you will have to find a way to remember which key to use for which server. More easy is to list them in your ssh config.

You can do so by adding some config to ~/.ssh/config
For our example we should add the following 2 lines to the file;

    IdentityFile /user/jolouwer/backup_key

this will ensure that the mentiond key is used every time you will connect from your workstation tot the host. 

Monday, November 26, 2012

Exponential function in SQL

In mathematics, the exponential function is the function ex, where e is the number (approximately 2.718281828) such that the function ex is its own derivative. The exponential function is used to model a relationship in which a constant change in the independent variable gives the same proportional change (i.e. percentage increase or decrease) in the dependent variable. The function is often written as exp(x), especially when it is impractical to write the independent variable as a superscript. The exponential function is widely used in physics, chemistry and mathematics.

When working with mathematical equations in Oracle you will come into contact with this exponential function at one point in time. Using it is quite simple, you can use it by calling the exp function in a way shown below;

      exp(1) AS exp_result 
FROM dual;


The above example also shows the precision of exp as it is implemented in the Oracle database.

Sunday, November 25, 2012

Rounding numbers in Oracle SQL

When working with numbers in an Oracle database you will have to round the numbers at some point in time. When you are creating SQL code it is good to know that you have primarily 3 options for rounding numbers in Oracle SQL. The 3 main options you have at your use are round, floor and ceil. All have there own use and should be used in the correct way in your code.

The ceil option wil give you the smallest integer greater or equal to the given number. Meaning; the ceil option will round the number upwards to a whole number. for example 1.4 will become 2 however 1.5 or 1.6 will also become 2.

select ceil(1.4) from dual;


select ceil(1.5) from dual;


The floor option will do exactly the opposite and will round down in the same way as ceil is doing downwards. Below you can see a rounding example using floor.

select floor(1.4) from dual;


select floor(1.5) from dual;


Both floor and ceil are very convenient options when you have to round a number to a integer however in many cases rounding to an integer is not what you would like. For "true" rounding you can make use of the SQL round function in an Oracle database.

When no additional information is given the round function will round a number to a integer like result. For example if we round 1.4432123421 it provide a return set of 1.

select round(1.4432123421) from dual;


However, giving some options will make sense in most cases, below are some examples of rounding;

select round(1.4432123421,1) from dual;


select round(1.4432123421,2) from dual;


select round(1.4432123421,3) from dual;


Friday, November 23, 2012

Oracle generate XML from SQL

XML is used in numerous applications and application designs because of all the good and all the bad reasons. If you are using XML correctly or incorrectly in your application design is out of scope of this blogpost. Whatever the reason, in some cases you need to generate a XML file from the result set of your database SQL query. There are several ways of doing this and most of them involve some custom coding. The Oracle database is however also equipped with a simple solution to return your result set in a XML format. If you are in need to have it returned in a XML format you can make use of the  DBMS_XMLGEN.GETXML options that the database provides you.

When you use DBMS_XMLGEN.GETXML in the most basic way your XML will be returned as a CLOB object. As a example we have a simple query on a table named testtable as shown below

    testtable tst
    tst.location NOT LIKE ('Amsterdam')

This provides the result as shown below as can be expected from a simple SQL select statement:

NAME                 LOCATION                                         
-------------------- --------------------------------------------------
Johan                Utrecht                                            
Krista               Utrecht                                            
Johan                De Meern                                           
Martin               De Meern 

However what we wanted to do is to have the resultset returned as CLOB object which holds a XML structure with the return set. To be able to do so we have to make use of DBMS_XMLGEN.GETXML. This is done in the below example. Do make note of the escape character for '. If you do not use it in this case you will run into an error.

      testtable tst
      tst.location NOT LIKE (''Amsterdam'')

This query will return you a CLOB object which holds the following XML structure:

<?xml version="1.0"?>

This is the most simple and most easy way to do this. You can however make it more usable and more designed for your application and use the CLOB object somewhere else in your code or store it in a table... this however is the first step in building XML directly in your database.

Tuesday, November 20, 2012

Hadoop HBase localhost considerations

When installing Hadoop HBase on (Ubuntu?) Linux you might run into some strange problems concerning networking when you try to start your new installed HBase node in a development mode where you only run one node on your local system. The issue might manifest itself first in your log files when you try to find out why things are not running as expected. On of the things you might see is something like this; Failed setting up proxy interface org.apache.hadoop.hbase.ipc.HRegionInterface to localhost/ after attempts=1

This message is somewhere in your logfile joining a complete java error stack, an example of the error stack can be found below and in fact is only a WARN level message however can mess things up quite a bid;

2012-11-20 09:29:08,473 WARN org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment of -ROOT-,,0.70236052 to localhost,34908,1353400143063, trying to assign elsewhere instead; retry=0
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up proxy interface org.apache.hadoop.hbase.ipc.HRegionInterface to localhost/ after attempts=1
 at org.apache.hadoop.hbase.ipc.HBaseRPC.handleConnectionException(
 at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(
 at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(
 at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(
 at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(
 at org.apache.hadoop.hbase.master.ServerManager.getServerConnection(
 at org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(
 at org.apache.hadoop.hbase.master.AssignmentManager.assign(
 at org.apache.hadoop.hbase.master.AssignmentManager.assign(
 at org.apache.hadoop.hbase.master.AssignmentManager.assign(
 at org.apache.hadoop.hbase.master.AssignmentManager.assign(
 at org.apache.hadoop.hbase.master.AssignmentManager.assignRoot(
 at org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(
 at org.apache.hadoop.hbase.master.HMaster.finishInitialization(
 at org.apache.hadoop.hbase.master.HMasterCommandLine$
Caused by: Connection refused
 at Method)
 at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupConnection(
 at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(
 at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(
 at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(
 at $Proxy12.getProtocolVersion(Unknown Source)
 at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(
 at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(
 at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(
 at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(
 at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(
 ... 15 more
2012-11-20 09:29:08,476 WARN org.apache.hadoop.hbase.master.AssignmentManager: Unable to find a viable location to assign region -ROOT-,,0.70236052

The root cause of this issue resides in your /etc/hosts file. If you check your /etc/hosts file you will find a entry something like the one below (in my case mu machine is named APEX1) localhost apex1

# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

The root cause is that apex1 resolves to which is incorrect as it should resolve to (or a external IP). As my external IP is I created the following /etc/hosts configuration; localhost apex1

# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

This will ensure that the resolving of your host processes on your localhost will be done correctly and you can start your HBase installation correctly on your development system.

Create Hadoop HBASE development system

Apache HBase is the Hadoop database, a distributed, scalable, big data store. HBase is a type of "NoSQL" database. "NoSQL" is a general term meaning that the database isn't an RDBMS which supports SQL as its primary access language, but there are many types of NoSQL databases: BerkeleyDB is an example of a local NoSQL database, whereas HBase is very much a distributed database. Technically speaking, HBase is really more a "Data Store" than "Data Base" because it lacks many of the features you find in an RDBMS, such as typed columns, secondary indexes, triggers, and advanced query languages, etc.

HBase isn't suitable for every problem.

First, make sure you have enough data. If you have hundreds of millions or billions of rows, then HBase is a good candidate. If you only have a few thousand/million rows, then using a traditional RDBMS might be a better choice due to the fact that all of your data might wind up on a single node (or two) and the rest of the cluster may be sitting idle.

Second, make sure you can live without all the extra features that an RDBMS provides (e.g., typed columns, secondary indexes, transactions, advanced query languages, etc.) An application built against an RDBMS cannot be "ported" to HBase by simply changing a JDBC driver, for example.

Consider moving from an RDBMS to HBase as a complete redesign as opposed to a port.
Third, make sure you have enough hardware. Even HDFS doesn't do well with anything less than 5 DataNodes (due to things such as HDFS block replication which has a default of 3), plus a NameNode.

HBase can run quite well stand-alone on a laptop - but this should be considered a development configuration only. And this is the topic we will touch in this blogpost, how can you quickly deploy a very light and simple installation of HBase on your laptop for pure development purposes so you can develop and test on the go.

Installing HBase is quite easy and straight forward. First step is to download the latest version of HBase from the website onto your Linux laptop. You can use one of the download mirrors from Apache which are listed on the Apache website.

When downloaded you will have to do a couple of things. First is that you have to go to the location where you have unpacked the downloaded HBase version and go to conf directory and edit the file hbase-site.xml in such a form that you can use it locally. As an example the configuration file on my ubuntu development laptop is shown below;

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

The values that you have to changes are the values for hbase.rootdir and . As this is a local installation you can point them to local directories in the same fashion as I have done in the example above.

When you have completed the hbase-site.xml file you can start HBase from the bin directory by executing which will start your local HBase installation.

As you might have been reading HBase will require you to have at least JAVA 1.6 installed on your system and will need the JAVA_HOME variable set. If JAVA_HOME is not set or JAVA is not installed you will notice a error message something like the one below;

|      Error: JAVA_HOME is not set and Java could not be found         |
| Please download the latest Sun JDK from the Sun Java web site        |
|       > <                      |
|                                                                      |
| HBase requires Java 1.6 or later.                                    |
| NOTE: This script will find Sun Java whether you install using the   |
|       binary or the RPM based installer.                             |

If JAVA_HOME is set your local HBase installation will start and you will see a message like the one below;
starting master, logging to /home/louwersj/hbase/hbase-0.94.2/bin/../logs/hbase-louwersj-master-apex1.out

Thursday, November 15, 2012

Costly mistake of ignoring Big Data

Big-data is already around us in the market for some time now and to some people it is living up to the expectations and to some people it is still a buzzword representing a hollow phrase. Regardless of the fact if you are in the big-data believer or in the big-data skeptic group it is undeniable that data is growing at a rapid speed and that the need for ways of gathering, storing, computing and making meaning out of is also growing. Also it is a undeniable fact that the resources and computing methods to cope with the growing number of data are changing.

When looking from a business demand perspective to big-data and not from a technical perspective to the subject we will also see that big-data is essence makes a lot of sense and can bring a lot of good to a company if used and implemented correctly.

As a C level manager in a large enterprise it is vital to look at big data. There is a lot of money to be made or a lot of money to be lost for a company. Forbes released an article named "The Deadly Cost Of Ignoring Big Data: $71.2 Million Per Year" which in itself already has a catchy title however some of the content is even more breathtaking;

"If your company’s reluctant to tackle the Big Data challenge because you think it’ll be too expensive or disruptive, you might want to consider the alternative: companies without aggressive plans for managing and exploiting the explosion in data report they are losing an average of $71.2 million per year.That’s not a cumulative total; rather, that’s $71.2 million from each company. Every year."

As a C level manager this is something to consider, however what is of vital importance is how your competitors are doing in the big-data arena and if you still have time to ramp up. Oracle has done a recent survey 333 C-level executives across 11 industries. Below are some of the key findings;
  • 94% of C-level executives say their organization is collecting and managing more business information today than two years ago, by an average of 86% more
  • 29% of executives give their organization a “D” or “F” in preparedness to manage the data deluge
  • 93% of executives believe their organization is losing revenue – on average, 14% annually – as a result of not being able to fully leverage the information they collect
  • Nearly all surveyed (97%) say their organization must make a change to improve information optimization over the next two years
  • Industry-specific applications are an important part of the mix; 77% of organizations surveyed use them today to run their enterprise—and they are looking for more tailored options
  • Executives in the communications industry note they are most prepared to manage a data deluge, with 20% giving themselves an "A"
  • In contrast, public sector, healthcare, and utilities industries are least prepared to handle the data deluge – with 41% of public sector executives, 40% of healthcare executives, and 39% of utilities executives giving themselves a "D" or "F" preparedness rating
  • The communications, manufacturing, and retail industries lose the lowest estimated percentage of annual revenue from their current data management processes – 10%
  • The oil and gas and life sciences industries lose the greatest estimated percentage of annual revenue, 22% and 20% respectively, from their current data management processes
When talking about big-data you first have to have a solid understanding of what big-data is and what it is to you, I have been publishing some blogposts on this subject already, below is what big-data is according to Gartner. The below graph is the Garter Hype Cycle for 2012 and you see that big-data is climbing up to the peak of inflated expectations. The expectation of Gartner is that it will take between 2 till 5 years before big-data will move up onto the plateau of productivity which is quite fast. 

As we know big-data is a subject which is to big to be put on its own. Gartner has a specific hype cycle for the sub-components they think bog-data consists out of which can give you some guidance of where it currently stands and what the expected flow towards the plateau will be.

The hype cycles from Gartner can help enterprises to gain understanding of where big-data is moving and at the same moment can help IT companies to understand where enterprise will most likely be moving in the upcoming years in the big-data arena. As large numbers of large enterprises will start investigating and implementing some sort of big-data strategy and will be followed by mid sized companies it will be of vital importance for survival to make sure that your company is a least aware of the big-data move and how this will play a role of influence in your market segment. Companies who do not start looking into this in due time will suffer the financial consequences and finally might not survive missing he big-data boat.

Wednesday, November 14, 2012


Companies like to work with schedules, think for example about equipment that needs to be checked every month. When you have a database where you track the dates when the equipment was checked you could write a query to figure out if a equipment part is due to be checked again or is overdue. When writing such a query you could use simple date calculations and trick some things to see the difference between one date and another date (for example sysdate) however there is a much more easy way to do this in an Oracle database using Oracle SQL when you make use of the MONTHS_BETWEEN function which is available.

The MONTHS_BETWEEN function shows the returns the number of months between one date and another date. For example if we have a table called service_log and we store the date of a service in the column service_date we could do something like the query below;

        MONTHS_BETWEEN(sysdate, service_date) as difference

This will show the months between the current date (sysdate) and the last service date as shown below;


When combining this with a round function you will be able to quickly build a report to show you if a equipment part is serviced on a monthly basis or not. An example using the round function is shown below;

        ROUND(MONTHS_BETWEEN(sysdate, service_date),2) as difference

Sunday, November 11, 2012

Inverse trigonometric functions in SQL

Databases are primarily used to store and retrieve data by applications and occasionally by users who directly query the database. When actions, outside the domain of storing and retrieving data are needed a lot of people, who do lack SQL development knowledge will go to the application layer of a stack to build in calculation and logic functions. In some cases this makes sense however in some other cased it would make sense to build some of the logic and some of the calculations into the database side of the application stack.

To be able to make full use of the application side of the stack it is needed to understand what is available to a developer from within the Oracle database by default. For example it is not known to all Oracle SQL (PL/SQL) developers that all mathematical inverse trigonometric functions are directly available for use within the SQL language.

In mathematics, the inverse trigonometric functions (occasionally called cyclometric functions) are the inverse functions of the trigonometric functions with suitably restricted domains.

The notations sin−1, cos−1, tan−1, etc. are often used for arcsin, arccos, arctan, etc., but this convention logically conflicts with the common semantics for expressions like sin2(x), which refer to numeric power rather than function composition, and therefore may result in confusion between multiplicative inverse and compositional inverse.

In computer programming languages the functions arcsin, arccos, arctan, are usually called asin, acos, atan. Many programming languages also provide the two-argument atan2 function, which computes the arctangent of y / x given y and x, but with a range of (−π, π].

Within the Oracle database we have the functions ACOS, ASIN, ATAN and ATAN2 available as in many other programming languages. All are very straight forward in use. Below you can find the examples:

As an example for the Oracle SQL ACOS function you can execute the below;

This will give you the following result;

Which is quite a precise number and might not be needed in all cases so you can apply a round to if for example from 3 which can be done by executing it in the following manner;

This will provide you with the following outcome (rounding can be done to any number you like simply by using the ROUND function in combination with the ACOS function;

As  an example to use the Oracle SQL ASIN function you can execute the below;

This will give you the following result;

As an example to use the Oracle SQL ATAN function you can execute the below;

This will give you the following result;

As an example to use the Oracle SQL ATAN2 function you can execute the below;

This will give you the following result;

All above mentioned functions are by default precise to 30 decimal digits (unless you use the ROUND function as showed in the ACOS example).

Saturday, November 10, 2012

Oracle absolute value function

Anyone who will be doing more than only simple select statements in the database and will start working on equations in the database will come along some mathematical functions every now and then. The need for a function to retrieve the absolute value of a number is more then just making sure it is a positive number.

In mathematics, the absolute value (or modulus) | a | of a real number a is the non-negative value of a without regard to its sign. Namely, | a | = a for a positive a, | a | = −a for a negative a, and | 0 | = 0. For example, the absolute value of 3 is 3, and the absolute value of −3 is also 3. The absolute value of a number may be thought of as its distance from zero.

Generalizations of the absolute value for real numbers occur in a wide variety of mathematical settings. For example an absolute value is also defined for the complex numbers, the quaternions, ordered rings, fields and vector spaces. The absolute value is closely related to the notions of magnitude, distance, and norm in various mathematical and physical contexts.

When you want to have the absolute value from a number (or any other type that can be converted to a numeric type) you can use the abs SQL function in the Oracle database.

for example;

SELECT ABS(-20) "absovalue" FROM DUAL;


Oracle SQL current_timestamp function

In many (database) applications it is important to keep some sort of logging on all kinds of actions taken. For example when was a record created or when was a record changed. When we are talking about a system which mainly interact with human users this can in most cases be done by stating a date and time which is precise up to the second. However in some cases, in many cases systems where you see a lot of machine to machine communication this is not accurate enough.

In a lot of cases where Oracle SQL developers implement a timestamp option in the application they make use of the systdate function from within the Oracle database which returns the current time and date based upon the operating system clock of the operating system the Oracle database is running on.

There is however a more precise way of getting the exact date and time which is using the current_timestamp function from the datbase.

You can extend the current_timestamp with a precision parameter. If you do not provide a precision parameter the default will be 6.

For example if you do;


you wil get

and if you do;


you wil get

As you can see the provided precision in influencing the the number of milliseconds that is used in your timestamp. Depending on the nature of your application and the accuracy that is needed for the timestamping this can be very important and worth looking into.

Tuesday, October 30, 2012

Why ADF Essentials is important for Oracle

Oracle was already using the java programming language intensive in its products however it is getting in full swing in the last couple of years. After they have acquired BEA and have used it to build the weblogic application server and after they acquired Sun you see that java is becoming the main application language for Oracle.

We have seen recently some legal battles between Google and Oracle about parts of Java and this is showing that Oracle is willing to go into battle mode for Java. However this is not meaning that they want to make java as proprietary as possible. One of the latest cases showing this is the release of ADF Essentials. ADF is the Oracle Application Development framework which is a proprietary development framework build by Oracle to support the development community and which is also used heavily by Oracle development teams internally for product development. One of the things about ADF is that you need theOracle weblogic server to run it (officially). This automatically rules out that ADF can be used to create open source products and it is limiting the adoption of ADF as you have a vendor lock with Oracle with the application server.

Now Oracle has launched ADF Essentials which is able to run not only on weblogic, it can also run on for example GlassFish or other application servers like for example IBM WebSphere. ADF Essentials can be used for free and you no longer have a vendor lock on the application server. ADF Essentials is a stripped down version of the commercial ADF version.

ADF Essentials is important for Oracle as they hope to boost the adoption of ADF with it, not only the Essentials version but also the commercial version. The commercial vision of Oracle behind Essentials is (most likely) that it will make that ADF will get more popular and that more developers will start developing in a ADF way and by doing so not only make the product base bigger for ADF Essentials, it should also make the adoption of the commercial version of ADF bigger and by doing so boost up the license revenue coming from the sales of WebLogic licenses.

If you are a company who will be building applications based upon Oracle ADF Essentials it is good to take some note of the ADF Essentials license policy. If we take a look at the ADF essentials license policy we note the following;

You are bound by the Oracle Technology Network ("OTN") License Agreement terms. The OTN License Agreement terms also apply to all updates you receive under your Technology Track subscription.

this means you will have to review the OTN License Agreement in debt to be sure what you can and cannot do with ADF Essentials from a legal point of view. Most important to know is the following: Oracle ADF Essentials is a free to develop and deploy packaging of the key core technologies of Oracle ADF.