When to use Hadoop

Friday, March 09, 2012

When to use Hadoop

Hadoop is one of the big players in the big-data and can be seen as one of the main engines running the big-data machine. We however still do not have a clear picture on what is big-data. we do have some definitions on when we call a lot of data big data however giving it a number has not been done up until now and will most likely never been done. I already zoomed in into this definition question in the "Map reduce into relation of Big Data and Oracle" post on this blog. A number of key components state if data is big-data, to name them; volume of the data, the velocity in which the data grows, the variety of sources which add to the volume of the data and the value it can "potentially" hold. These factors can help you decide when data is big data.

Then we have the question on when data (even big-data) can still be handled in a standard relational database and can still be handled by a "standard" approach. There are some guidelines that can help you. Please do note this is a comparison primarily for handling data in a relational database or in Hadoop. This is not for storing data.

	RDBMS	Hadoop / MapReduce
Data Size	Gigabytes	Petabytes
Access	Interactive and batch	Batch
Structure	Fixed Schema	Unstructured schema
Language	SQL	Procedural (Java, C++, Ruby, etc.)
Integrity	High	Low
Scaling	nonlinear	linear
Updates	Read and Write	Write ones, read many times
Latency	Low	High

By taking this into consideration when you are struggling with the question if you need to use a MapReduce approach or a RDBMS approach it might be a little more easy to make your decision.

Johan Louwers - Tech blog

Friday, March 09, 2012

When to use Hadoop

No comments: