Thursday, December 15, 2011

Hadoop explained by Mike Olson

Hadoop is a Apache project aimed to build a framework a framework for running applications on large cluster built of commodity hardware. The Hadoop framework transparently provides applications both reliability and data motion. Hadoop implements a computational paradigm named Map/Reduce, where the application is divided into many small fragments of work, each of which may be executed or re-executed on any node in the cluster. In addition, it provides a distributed file system (HDFS) that stores data on the compute nodes, providing very high aggregate bandwidth across the cluster. Both MapReduce and the Hadoop Distributed File System are designed so that node failures are automatically handled by the framework.

When we look at where the future of computing and the future of data we can see Hadoop on a very strategic location on the roadmap and within the overall framework. Hadoop is one of the ultimate building blocks in the framework which is responsible for parallelism within this framework and can be seen as one of the main engines for handling big-data.

In the below video Mike Olson is explaining some parts of the future framework of computing and explains Hadoop and some other parts in depth. Mike Olson is a the CEO of Cloudera, Cloudera is one of the leading companies who are investing in the Hadoop community.

1 comment:

Unknown said...

Hadoop is framework with combination of different framework like map reduce,hdfs,hbase,hive.
HDFS stores the data blocks in the form of files in cluster nodes.There was no tables,no columns in hdfs.
Map Reduce is the powerful parallel processing of data located in clustered nodes.
Hive is datawarehousing tool and SQL wrapper for processing large amount of data. Hive can be used for olap processing.
HBase is a database on top of hdfs. Hbase can be used for realtime processing i.e OLTP processing.

Please click What is Hadoop to know more on Basics of Hadoop and different sub components