Johan Louwers - Tech blog: November 2012

Monday, November 26, 2012

Exponential function in SQL

In mathematics, the exponential function is the function e^x, where e is the number (approximately 2.718281828) such that the function e^x is its own derivative. The exponential function is used to model a relationship in which a constant change in the independent variable gives the same proportional change (i.e. percentage increase or decrease) in the dependent variable. The function is often written as exp(x), especially when it is impractical to write the independent variable as a superscript. The exponential function is widely used in physics, chemistry and mathematics.

When working with mathematical equations in Oracle you will come into contact with this exponential function at one point in time. Using it is quite simple, you can use it by calling the exp function in a way shown below;

SELECT 
      exp(1) AS exp_result 
FROM dual;


EXP_RESULT
----------
2.71828182845904523536028747135266249776

The above example also shows the precision of exp as it is implemented in the Oracle database.

Sunday, November 25, 2012

Rounding numbers in Oracle SQL

When working with numbers in an Oracle database you will have to round the numbers at some point in time. When you are creating SQL code it is good to know that you have primarily 3 options for rounding numbers in Oracle SQL. The 3 main options you have at your use are round, floor and ceil. All have there own use and should be used in the correct way in your code.

The ceil option wil give you the smallest integer greater or equal to the given number. Meaning; the ceil option will round the number upwards to a whole number. for example 1.4 will become 2 however 1.5 or 1.6 will also become 2.

select ceil(1.4) from dual;

CEIL(1.4)
---------
        2 


select ceil(1.5) from dual;

CEIL(1.5)
---------
        2

The floor option will do exactly the opposite and will round down in the same way as ceil is doing downwards. Below you can see a rounding example using floor.

select floor(1.4) from dual;

FLOOR(1.4)
----------
         1 


select floor(1.5) from dual;

FLOOR(1.5)
----------
         1

Both floor and ceil are very convenient options when you have to round a number to a integer however in many cases rounding to an integer is not what you would like. For "true" rounding you can make use of the SQL round function in an Oracle database.

When no additional information is given the round function will round a number to a integer like result. For example if we round 1.4432123421 it provide a return set of 1.

select round(1.4432123421) from dual;

ROUND(1.4432123421)
-------------------
                  1

However, giving some options will make sense in most cases, below are some examples of rounding;

select round(1.4432123421,1) from dual;

ROUND(1.4432123421,1)
---------------------
                  1.4 


select round(1.4432123421,2) from dual;

ROUND(1.4432123421,2)
---------------------
                 1.44


select round(1.4432123421,3) from dual;


ROUND(1.4432123421,3)
---------------------
                1.443

Friday, November 23, 2012

Oracle generate XML from SQL

XML is used in numerous applications and application designs because of all the good and all the bad reasons. If you are using XML correctly or incorrectly in your application design is out of scope of this blogpost. Whatever the reason, in some cases you need to generate a XML file from the result set of your database SQL query. There are several ways of doing this and most of them involve some custom coding. The Oracle database is however also equipped with a simple solution to return your result set in a XML format. If you are in need to have it returned in a XML format you can make use of the DBMS_XMLGEN.GETXML options that the database provides you.

When you use DBMS_XMLGEN.GETXML in the most basic way your XML will be returned as a CLOB object. As a example we have a simple query on a table named testtable as shown below

SELECT 
      tst.name,
      tst.location 
FROM 
    testtable tst
WHERE 
    tst.location NOT LIKE ('Amsterdam')

This provides the result as shown below as can be expected from a simple SQL select statement:

NAME                 LOCATION                                         
-------------------- --------------------------------------------------
Johan                Utrecht                                            
Krista               Utrecht                                            
Johan                De Meern                                           
Martin               De Meern

However what we wanted to do is to have the resultset returned as CLOB object which holds a XML structure with the return set. To be able to do so we have to make use of DBMS_XMLGEN.GETXML. This is done in the below example. Do make note of the escape character for '. If you do not use it in this case you will run into an error.

SELECT DBMS_XMLGEN.GETXML
('
  SELECT 
        tst.name,
        tst.location 
  FROM 
      testtable tst
  WHERE 
      tst.location NOT LIKE (''Amsterdam'')
  ')
FROM 
    dual;

This query will return you a CLOB object which holds the following XML structure:

<?xml version="1.0"?>
<ROWSET>
<ROW>
<NAME>Johan</NAME>
<LOCATION>Utrecht</LOCATION>
</ROW>
<ROW>
<NAME>Krista</NAME>
<LOCATION>Utrecht</LOCATION>
</ROW>
<ROW>
<NAME>Johan</NAME>
<LOCATION>De Meern</LOCATION>
</ROW>
<ROW>
<NAME>Martin</NAME>
<LOCATION>De Meern</LOCATION>
</ROW>
</ROWSET>

This is the most simple and most easy way to do this. You can however make it more usable and more designed for your application and use the CLOB object somewhere else in your code or store it in a table... this however is the first step in building XML directly in your database.

Tuesday, November 20, 2012

Hadoop HBase localhost considerations

When installing Hadoop HBase on (Ubuntu?) Linux you might run into some strange problems concerning networking when you try to start your new installed HBase node in a development mode where you only run one node on your local system. The issue might manifest itself first in your log files when you try to find out why things are not running as expected. On of the things you might see is something like this; Failed setting up proxy interface org.apache.hadoop.hbase.ipc.HRegionInterface to localhost/127.0.0.1:34908 after attempts=1

This message is somewhere in your logfile joining a complete java error stack, an example of the error stack can be found below and in fact is only a WARN level message however can mess things up quite a bid;

2012-11-20 09:29:08,473 WARN org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment of -ROOT-,,0.70236052 to localhost,34908,1353400143063, trying to assign elsewhere instead; retry=0
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up proxy interface org.apache.hadoop.hbase.ipc.HRegionInterface to localhost/127.0.0.1:34908 after attempts=1
 at org.apache.hadoop.hbase.ipc.HBaseRPC.handleConnectionException(HBaseRPC.java:291)
 at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:259)
 at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1313)
 at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1269)
 at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1256)
 at org.apache.hadoop.hbase.master.ServerManager.getServerConnection(ServerManager.java:550)
 at org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:483)
 at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1640)
 at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1363)
 at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1338)
 at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1333)
 at org.apache.hadoop.hbase.master.AssignmentManager.assignRoot(AssignmentManager.java:2212)
 at org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:632)
 at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:529)
 at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:344)
 at org.apache.hadoop.hbase.master.HMasterCommandLine$LocalHMaster.run(HMasterCommandLine.java:220)
 at java.lang.Thread.run(Thread.java:679)
Caused by: java.net.ConnectException: Connection refused
 at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
 at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592)
 at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
 at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:489)
 at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupConnection(HBaseClient.java:416)
 at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:462)
 at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1150)
 at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:1000)
 at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150)
 at $Proxy12.getProtocolVersion(Unknown Source)
 at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:183)
 at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:335)
 at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:312)
 at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:364)
 at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:236)
 ... 15 more
2012-11-20 09:29:08,476 WARN org.apache.hadoop.hbase.master.AssignmentManager: Unable to find a viable location to assign region -ROOT-,,0.70236052

The root cause of this issue resides in your /etc/hosts file. If you check your /etc/hosts file you will find a entry something like the one below (in my case mu machine is named APEX1)

127.0.0.1 localhost
127.0.1.1 apex1

# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

The root cause is that apex1 resolves to 127.0.1.1 which is incorrect as it should resolve to 127.0.0.1 (or a external IP). As my external IP is 192.168.58.10 I created the following /etc/hosts configuration;

127.0.0.1 localhost
192.168.58.10 apex1

# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

This will ensure that the resolving of your host processes on your localhost will be done correctly and you can start your HBase installation correctly on your development system.

Create Hadoop HBASE development system

Apache HBase is the Hadoop database, a distributed, scalable, big data store. HBase is a type of "NoSQL" database. "NoSQL" is a general term meaning that the database isn't an RDBMS which supports SQL as its primary access language, but there are many types of NoSQL databases: BerkeleyDB is an example of a local NoSQL database, whereas HBase is very much a distributed database. Technically speaking, HBase is really more a "Data Store" than "Data Base" because it lacks many of the features you find in an RDBMS, such as typed columns, secondary indexes, triggers, and advanced query languages, etc.

HBase isn't suitable for every problem.

First, make sure you have enough data. If you have hundreds of millions or billions of rows, then HBase is a good candidate. If you only have a few thousand/million rows, then using a traditional RDBMS might be a better choice due to the fact that all of your data might wind up on a single node (or two) and the rest of the cluster may be sitting idle.

Second, make sure you can live without all the extra features that an RDBMS provides (e.g., typed columns, secondary indexes, transactions, advanced query languages, etc.) An application built against an RDBMS cannot be "ported" to HBase by simply changing a JDBC driver, for example.

Consider moving from an RDBMS to HBase as a complete redesign as opposed to a port.
Third, make sure you have enough hardware. Even HDFS doesn't do well with anything less than 5 DataNodes (due to things such as HDFS block replication which has a default of 3), plus a NameNode.

HBase can run quite well stand-alone on a laptop - but this should be considered a development configuration only. And this is the topic we will touch in this blogpost, how can you quickly deploy a very light and simple installation of HBase on your laptop for pure development purposes so you can develop and test on the go.

Installing HBase is quite easy and straight forward. First step is to download the latest version of HBase from the website onto your Linux laptop. You can use one of the download mirrors from Apache which are listed on the Apache website.

When downloaded you will have to do a couple of things. First is that you have to go to the location where you have unpacked the downloaded HBase version and go to conf directory and edit the file hbase-site.xml in such a form that you can use it locally. As an example the configuration file on my ubuntu development laptop is shown below;

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hbase.rootdir</name>
<value>file:///home/louwersj/hbase/hbase_data</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/louwersj/hbase/zookeeper_data</value>
</property>
</configuration>

The values that you have to changes are the values for hbase.rootdir and hbase.zookeeper.property.dataDir . As this is a local installation you can point them to local directories in the same fashion as I have done in the example above.

When you have completed the hbase-site.xml file you can start HBase from the bin directory by executing start-hbase.sh which will start your local HBase installation.

As you might have been reading HBase will require you to have at least JAVA 1.6 installed on your system and will need the JAVA_HOME variable set. If JAVA_HOME is not set or JAVA is not installed you will notice a error message something like the one below;

+======================================================================+
|      Error: JAVA_HOME is not set and Java could not be found         |
+----------------------------------------------------------------------+
| Please download the latest Sun JDK from the Sun Java web site        |
|       > http://java.sun.com/javase/downloads/ <                      |
|                                                                      |
| HBase requires Java 1.6 or later.                                    |
| NOTE: This script will find Sun Java whether you install using the   |
|       binary or the RPM based installer.                             |
+======================================================================+

If JAVA_HOME is set your local HBase installation will start and you will see a message like the one below;

starting master, logging to /home/louwersj/hbase/hbase-0.94.2/bin/../logs/hbase-louwersj-master-apex1.out

Thursday, November 15, 2012

Costly mistake of ignoring Big Data

Big-data is already around us in the market for some time now and to some people it is living up to the expectations and to some people it is still a buzzword representing a hollow phrase. Regardless of the fact if you are in the big-data believer or in the big-data skeptic group it is undeniable that data is growing at a rapid speed and that the need for ways of gathering, storing, computing and making meaning out of is also growing. Also it is a undeniable fact that the resources and computing methods to cope with the growing number of data are changing.

When looking from a business demand perspective to big-data and not from a technical perspective to the subject we will also see that big-data is essence makes a lot of sense and can bring a lot of good to a company if used and implemented correctly.

As a C level manager in a large enterprise it is vital to look at big data. There is a lot of money to be made or a lot of money to be lost for a company. Forbes released an article named "The Deadly Cost Of Ignoring Big Data: $71.2 Million Per Year" which in itself already has a catchy title however some of the content is even more breathtaking;

"If your company’s reluctant to tackle the Big Data challenge because you think it’ll be too expensive or disruptive, you might want to consider the alternative: companies without aggressive plans for managing and exploiting the explosion in data report they are losing an average of $71.2 million per year.That’s not a cumulative total; rather, that’s $71.2 million from each company. Every year."

As a C level manager this is something to consider, however what is of vital importance is how your competitors are doing in the big-data arena and if you still have time to ramp up. Oracle has done a recent survey 333 C-level executives across 11 industries. Below are some of the key findings;

94% of C-level executives say their organization is collecting and managing more business information today than two years ago, by an average of 86% more
29% of executives give their organization a “D” or “F” in preparedness to manage the data deluge
93% of executives believe their organization is losing revenue – on average, 14% annually – as a result of not being able to fully leverage the information they collect
Nearly all surveyed (97%) say their organization must make a change to improve information optimization over the next two years
Industry-specific applications are an important part of the mix; 77% of organizations surveyed use them today to run their enterprise—and they are looking for more tailored options
Executives in the communications industry note they are most prepared to manage a data deluge, with 20% giving themselves an "A"
In contrast, public sector, healthcare, and utilities industries are least prepared to handle the data deluge – with 41% of public sector executives, 40% of healthcare executives, and 39% of utilities executives giving themselves a "D" or "F" preparedness rating
The communications, manufacturing, and retail industries lose the lowest estimated percentage of annual revenue from their current data management processes – 10%
The oil and gas and life sciences industries lose the greatest estimated percentage of annual revenue, 22% and 20% respectively, from their current data management processes

When talking about big-data you first have to have a solid understanding of what big-data is and what it is to you, I have been publishing some blogposts on this subject already, below is what big-data is according to Gartner. The below graph is the Garter Hype Cycle for 2012 and you see that big-data is climbing up to the peak of inflated expectations. The expectation of Gartner is that it will take between 2 till 5 years before big-data will move up onto the plateau of productivity which is quite fast.

As we know big-data is a subject which is to big to be put on its own. Gartner has a specific hype cycle for the sub-components they think bog-data consists out of which can give you some guidance of where it currently stands and what the expected flow towards the plateau will be.

The hype cycles from Gartner can help enterprises to gain understanding of where big-data is moving and at the same moment can help IT companies to understand where enterprise will most likely be moving in the upcoming years in the big-data arena. As large numbers of large enterprises will start investigating and implementing some sort of big-data strategy and will be followed by mid sized companies it will be of vital importance for survival to make sure that your company is a least aware of the big-data move and how this will play a role of influence in your market segment. Companies who do not start looking into this in due time will suffer the financial consequences and finally might not survive missing he big-data boat.

Wednesday, November 14, 2012

MONTHS_BETWEEN function SQL

Companies like to work with schedules, think for example about equipment that needs to be checked every month. When you have a database where you track the dates when the equipment was checked you could write a query to figure out if a equipment part is due to be checked again or is overdue. When writing such a query you could use simple date calculations and trick some things to see the difference between one date and another date (for example sysdate) however there is a much more easy way to do this in an Oracle database using Oracle SQL when you make use of the MONTHS_BETWEEN function which is available.

The MONTHS_BETWEEN function shows the returns the number of months between one date and another date. For example if we have a table called service_log and we store the date of a service in the column service_date we could do something like the query below;

  SELECT
        MONTHS_BETWEEN(sysdate, service_date) as difference
  FROM 
      service_log

This will show the months between the current date (sysdate) and the last service date as shown below;

  DIFFERENCE
  ----------
  4.05244063620071684587813620071684587814 
  3 
  2.05244063620071684587813620071684587814 
  1.98792450716845878136200716845878136201 
  0.9556664426523297491039426523297491039427

When combining this with a round function you will be able to quickly build a report to show you if a equipment part is serviced on a monthly basis or not. An example using the round function is shown below;

  SELECT
        ROUND(MONTHS_BETWEEN(sysdate, service_date),2) as difference
  FROM 
      service_log

Sunday, November 11, 2012

Inverse trigonometric functions in SQL

Databases are primarily used to store and retrieve data by applications and occasionally by users who directly query the database. When actions, outside the domain of storing and retrieving data are needed a lot of people, who do lack SQL development knowledge will go to the application layer of a stack to build in calculation and logic functions. In some cases this makes sense however in some other cased it would make sense to build some of the logic and some of the calculations into the database side of the application stack.

To be able to make full use of the application side of the stack it is needed to understand what is available to a developer from within the Oracle database by default. For example it is not known to all Oracle SQL (PL/SQL) developers that all mathematical inverse trigonometric functions are directly available for use within the SQL language.

In mathematics, the inverse trigonometric functions (occasionally called cyclometric functions) are the inverse functions of the trigonometric functions with suitably restricted domains.

The notations sin⁻¹, cos⁻¹, tan⁻¹, etc. are often used for arcsin, arccos, arctan, etc., but this convention logically conflicts with the common semantics for expressions like sin²(x), which refer to numeric power rather than function composition, and therefore may result in confusion between multiplicative inverse and compositional inverse.

In computer programming languages the functions arcsin, arccos, arctan, are usually called asin, acos, atan. Many programming languages also provide the two-argument atan2 function, which computes the arctangent of y / x given y and x, but with a range of (−π, π].

Within the Oracle database we have the functions ACOS, ASIN, ATAN and ATAN2 available as in many other programming languages. All are very straight forward in use. Below you can find the examples:

ACOS
As an example for the Oracle SQL ACOS function you can execute the below;

  
  SELECT 
        acos(.3) 
  FROM 
      dual;

This will give you the following result;

  1.2661036727794991112593187304122222822

Which is quite a precise number and might not be needed in all cases so you can apply a round to if for example from 3 which can be done by executing it in the following manner;

  
  SELECT 
        round(acos(.3),4) 
  FROM 
      dual;

This will provide you with the following outcome (rounding can be done to any number you like simply by using the ROUND function in combination with the ACOS function;

  1.2661

ASIN
As an example to use the Oracle SQL ASIN function you can execute the below;

  
  SELECT 
        asin(.3) 
  FROM 
      dual;

This will give you the following result;

  0.3046926540153975079720029612275291599

ATAN
As an example to use the Oracle SQL ATAN function you can execute the below;

   
  SELECT 
        atan(.3) 
  FROM 
      dual;

This will give you the following result;

  0.2914567944778670919956046214328911935013

ATAN2
As an example to use the Oracle SQL ATAN2 function you can execute the below;

  
  SELECT 
        atan2(.4,.5) 
  FROM 
      dual;

This will give you the following result;

  0.674740942223552663056520973609813615077

All above mentioned functions are by default precise to 30 decimal digits (unless you use the ROUND function as showed in the ACOS example).

Saturday, November 10, 2012

Oracle absolute value function

Anyone who will be doing more than only simple select statements in the database and will start working on equations in the database will come along some mathematical functions every now and then. The need for a function to retrieve the absolute value of a number is more then just making sure it is a positive number.

In mathematics, the absolute value (or modulus) | a | of a real number a is the non-negative value of a without regard to its sign. Namely, | a | = a for a positive a, | a | = −a for a negative a, and | 0 | = 0. For example, the absolute value of 3 is 3, and the absolute value of −3 is also 3. The absolute value of a number may be thought of as its distance from zero.

Generalizations of the absolute value for real numbers occur in a wide variety of mathematical settings. For example an absolute value is also defined for the complex numbers, the quaternions, ordered rings, fields and vector spaces. The absolute value is closely related to the notions of magnitude, distance, and norm in various mathematical and physical contexts.

When you want to have the absolute value from a number (or any other type that can be converted to a numeric type) you can use the abs SQL function in the Oracle database.

for example;

SELECT ABS(-20) "absovalue" FROM DUAL;

 absovalue
----------
        20

Oracle SQL current_timestamp function

In many (database) applications it is important to keep some sort of logging on all kinds of actions taken. For example when was a record created or when was a record changed. When we are talking about a system which mainly interact with human users this can in most cases be done by stating a date and time which is precise up to the second. However in some cases, in many cases systems where you see a lot of machine to machine communication this is not accurate enough.

In a lot of cases where Oracle SQL developers implement a timestamp option in the application they make use of the systdate function from within the Oracle database which returns the current time and date based upon the operating system clock of the operating system the Oracle database is running on.

There is however a more precise way of getting the exact date and time which is using the current_timestamp function from the datbase.

You can extend the current_timestamp with a precision parameter. If you do not provide a precision parameter the default will be 6.

For example if you do;

select
  current_timestamp(1)
from
    dual;

you wil get

10-NOV-12 02.20.30.600000000 PM US/PACIFIC

and if you do;

select
  current_timestamp(3)
from
    dual;

you wil get

10-NOV-12 02.20.30.615000000 PM US/PACIFIC

As you can see the provided precision in influencing the the number of milliseconds that is used in your timestamp. Depending on the nature of your application and the accuracy that is needed for the timestamping this can be very important and worth looking into.