Saturday, February 01, 2014

Amazon Redshift cloud based data warehouse service

In the era of big data and the massive collection of data by corporations there is a growing need for scalable and affordable ways of storing all data and processing data on large scale. For some of this data, which might not be that mission critical to your operations or is not that confidential a solution might be available in the cloud. More and more cloud companies do provide the option to use cloud based databases for massive data storage or in the case of Amazon Redshift a complete data warehouse service.

What Amazon Redshift in essence provide is a large scale cloud data platform based upon a columnar storage principle and which is accessible via standard JDBC and ODBC and where you can execute standard SQL commands. Amazon Redshift is an tailored implementation of the ParAccel platform. This means you can connect your standard business intelligence tooling to it to work with Amazon Redshift.

For initial loading Amazon is offering a number of options, for example, you can load data from Amazon S3Amazon DynamoDB or AWS Data Pipeline. Next to this numerous other ways, including SQL style insert batches, can be used to load your initial data into Amazon Redshift.



Even though I do not think it is a good idea to have a cloud only and by default strategy as some companies try to implement it can be a good strategy for some of your systems and some of your data. Next to the fact that you have to think about how to manage data and systems in the cloud and have to realize that there are some implications that are good and some that are bad there is also the legal side of storing your data in the cloud. This legal side might be facing some added complexity with the upcoming introduction of the European union data protection reform which will become active somewhere in 2014.

However, in the cases where you will have the option to store your data in the cloud and you are confortable with it then Amazon Redshift is a very valid choice for massive data storage for data warehousing. Amazon has provided some fairly acceptable security measures for non confidential data. For data security Amazon is currently encrypting every data block with AES-256 which is considered fairly secure even though a cryptanalysis related-key attack vulnerability has been discovered by Alex Biryukov and Dmitry Khovratovich. To speed the encryption and decryption process up Amazon is using hardware accelerated encryption to ensure this layer of security is not impacting performance (to much).

On the Amazon website a number of BI solutions have been outlined to be compatible with Amazon Redshift and should be able to connect to it to provide BI capabilities. The products currently listed and stated to be able to connect to Amazon Redshift are; Actian, Actuate, Birst, Chartio, Dundas, Infor, Jaspersoft, Jreport, Logi Analytics, Looker, MicrostrategyPentaho, Redrock BI, SiSense and Tableau. All those companies provide software that is capable of using the Amazon Redshift platform to do Business Intelligence on.

As an consultancy and implementation partner Amazon is stating Capgemini to be a preferred partner with the following quotation from Capgemini on the Amazon website; "We’re impressed with Amazon Redshift's ease of implementation, scalability to meet virtually any price point, and ability to handle the toughest big data and predictive analytics demands of our clients. In the era of Big Data and advanced analytics, we see Amazon Redshift and other AWS services, like Amazon Elastic MapReduce, as vital additions to the solutions and service offerings we provide".

A less known fact, or rather an overlooked fact, is that next to the above mentioned products you can essentially connect all BI tools to Amazon Redshift while keeping your BI processing platform on premise or somewhere in the cloud and you are NOT bound by the mentioned BI vendors that Amazon is stating. For example, it is very well possible to connect your Oracle OBIEE implementation to Amazon Redshift.
The above image comes from the rittmanmead.com website who have some interesting articles on OBIEE and how you can connect them to numerous datasources including the Amazon Redshift implementation. With Oracle OBIEE you are not bound to use a Oracle database and you can still use all the features of OBIEE.

No comments: