Tuesday, November 20, 2012

Create Hadoop HBASE development system

Apache HBase is the Hadoop database, a distributed, scalable, big data store. HBase is a type of "NoSQL" database. "NoSQL" is a general term meaning that the database isn't an RDBMS which supports SQL as its primary access language, but there are many types of NoSQL databases: BerkeleyDB is an example of a local NoSQL database, whereas HBase is very much a distributed database. Technically speaking, HBase is really more a "Data Store" than "Data Base" because it lacks many of the features you find in an RDBMS, such as typed columns, secondary indexes, triggers, and advanced query languages, etc.

HBase isn't suitable for every problem.

First, make sure you have enough data. If you have hundreds of millions or billions of rows, then HBase is a good candidate. If you only have a few thousand/million rows, then using a traditional RDBMS might be a better choice due to the fact that all of your data might wind up on a single node (or two) and the rest of the cluster may be sitting idle.

Second, make sure you can live without all the extra features that an RDBMS provides (e.g., typed columns, secondary indexes, transactions, advanced query languages, etc.) An application built against an RDBMS cannot be "ported" to HBase by simply changing a JDBC driver, for example.

Consider moving from an RDBMS to HBase as a complete redesign as opposed to a port.
Third, make sure you have enough hardware. Even HDFS doesn't do well with anything less than 5 DataNodes (due to things such as HDFS block replication which has a default of 3), plus a NameNode.

HBase can run quite well stand-alone on a laptop - but this should be considered a development configuration only. And this is the topic we will touch in this blogpost, how can you quickly deploy a very light and simple installation of HBase on your laptop for pure development purposes so you can develop and test on the go.

Installing HBase is quite easy and straight forward. First step is to download the latest version of HBase from the website onto your Linux laptop. You can use one of the download mirrors from Apache which are listed on the Apache website.

When downloaded you will have to do a couple of things. First is that you have to go to the location where you have unpacked the downloaded HBase version and go to conf directory and edit the file hbase-site.xml in such a form that you can use it locally. As an example the configuration file on my ubuntu development laptop is shown below;

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

The values that you have to changes are the values for hbase.rootdir and hbase.zookeeper.property.dataDir . As this is a local installation you can point them to local directories in the same fashion as I have done in the example above.

When you have completed the hbase-site.xml file you can start HBase from the bin directory by executing start-hbase.sh which will start your local HBase installation.

As you might have been reading HBase will require you to have at least JAVA 1.6 installed on your system and will need the JAVA_HOME variable set. If JAVA_HOME is not set or JAVA is not installed you will notice a error message something like the one below;

|      Error: JAVA_HOME is not set and Java could not be found         |
| Please download the latest Sun JDK from the Sun Java web site        |
|       > http://java.sun.com/javase/downloads/ <                      |
|                                                                      |
| HBase requires Java 1.6 or later.                                    |
| NOTE: This script will find Sun Java whether you install using the   |
|       binary or the RPM based installer.                             |

If JAVA_HOME is set your local HBase installation will start and you will see a message like the one below;
starting master, logging to /home/louwersj/hbase/hbase-0.94.2/bin/../logs/hbase-louwersj-master-apex1.out
Post a Comment