Saturday, July 26, 2014

Query Big Data with SQL

Data management used to be “easy” within enterprises, in most common cases data lived was stored in files on a file system or it was stored in a relational database. With some small exceptions this was where you where able to find data. With the explosion of data we see today and with the innovation around the question how to handle the data explosion we see a lot more options coming into play. The rise of NoSQL databases and the rise of HDFS based Hadoop solutions places data in a lot more places then only the two mentioned.
Having the option to store data where it is most likely adding the most value to the company is from an architectural point of view a great addition. By having the option for example to not choice for a relational database however store data in a NoSQL database or HDFS file system is giving architects a lot more flexibility when creating an enterprise wide strategy. However, it is also causing a new problem, when you try to combine data this might become much harder. When you store all your data in a relations database you can easily query all the data with a single SQL statement. When parts of your data reside in a relational database, parts in a NoSQL database and parts in a HDFS cluster the answer to this question might become a bit harder and a lot of additional coding might be required to get a single overview.
Oracle announced “Oracle Big Data SQL” which is an “addition” to the SQL language which enables you to query data not only in the Oracle Database however also query, in the same select statement, data that resides in other places. Other places being Hadoop HDFS clusters and NoSQL databases. By extending the data dictionary of the Oracle database and allowing it to store information of data that is stored in NoSQL or Hadoop HDFS clusters Oracle can now make use of those sources in combination with the data stored in the database itself.

The Oracle Big Data SQL way of working will allow you to create single queries in your familiar SQL language however execute them on other platforms. The Oracle Big Data SQL implementation will take care of the translation to other languages while developers can stick to SQL as they are used to.


Oracle Big Data SQL is available with Oracle Database 12C in combination with theOracle Exadata Engineered system and the Oracle Big Data appliance engineered system. The use of Oracle Engineered systems make sense as you are able to use infiniband connections between the two systems to eliminate the network bottleneck. Also the entire design of pushing parts of a query to another system is in line with how Exadata works. In the Exadata machine the workload (or number crunching) is done for a large part not on the compute nodes but rather on the storage nodes. This ensures that more CPU cycles are available for other tasks and sorting, filtering and other things are done where they are supposed to be done, on the storage layer.

A similar strategy is what you see in the implementation of Oracle Big Data SQL. When a query (or part of a query) is pushed to the Oracle Big Data Appliance only the answer is send back and not a full set of data. This means that (again) the CPU’s of the database instance are not loaded with tasks that can be done somewhere else (on the Big Data Appliance).
The option to use Oracle Big Data SQL has a number of advantages to our customers, both on a technical as well as architectural and integration level. We can now lower the load on database instance CPU’s and are not forced to manual create connections between relations databases and NoSQL and Hadoop HDFS solutions. While on the other hand helps customers get rapid return on investment. Some Capgemini statements can be found on the Oracle website in a post by Peter Jeffcock and Brad Tewksbury from Oracle after the Oracle Key partner briefing on Oracle Big Data SQL.

Sunday, July 13, 2014

Oracle will take three years to become a cloud company

Traditional software vendors who have been relying on a steady income of license revenue are forced to ether change the standing business model radically or been overrun by new and upcoming companies. The change that cloud computing is bringing is by some industry analysts compared to the introduction of the Internet. The introduction and the rapid growth of the internet did start a complete new sub-industry in the IT industry and has created the IT-bubble which made numerous companies bankrupt when it did burst.

As the current standing companies see the thread and possibilities of cloud computing rising they are trying to change direction to ensure survival. Oracle, being one of the biggest enterprise oriented software vendors at this moment is currently changing direction and stepping into cloud computing full swing. This by extending on the more traditional way of doing business by providing tools to create private cloud solutions for customer and also by becoming the new cloud vendor in the form of IaaS, SaaS, DBaaS and some other forms of cloud computing.

According to a recent article from Investor Business Daily the transition for Oracle will take around three years to complete. Based upon Susan Anthony, an analyst for Mirabaud Securities, it will take around five years until cloud based solutions will contribute significantly more then the current license sales model;

"As the shift takes place, the software vendors' new license revenues will ... be replaced to some extent by the cloud-subscription model, which within three years will match the revenues that would have been generated by the equivalent perpetual license and, over five years, contribute significantly more"

The key to success for Oracle and for other companies will be to attract different minded people then they are currently have. The traditional way of thinking is so deeply embedded in the companies that a more cloud minded generation will be needed to help turn the cloud transformation for traditional companies into a success. Michael Turits, an analyst for Raymond James & Associates states the following on this critical success factor:

"It takes a lot to turn the battleship and transition a legacy (software) company into a cloud company, We believe they are hiring people to focus on cloud sales and that the incentive structure is being altered to speed the transition."

Analysts are united in the believe that this is a needed transition for Oracle to survive however that it will, on the short term will hurt the revenue stream of the company and by doing so it will negatively influence the stock price for the upcoming years. Rick Sherlund, a Nomura Securities analyst, wrote in a June 25 research note:

"Oracle, like other traditional, on-premise software vendors, will be financially disadvantaged over the short term as its upfront on-premise license revenues are cannibalized by the recurring cloud-based revenues, therefore, we model expected license revenues to be flat to down for the next two years (during) the transition."

Currently we can see the transition taking place, June 25, 2014, Mark Hurd presented the Oracle Cloud Strategy for the upcoming years. Not only the expansion in global datacenters for hosting the new business model however also the growth predictions for the upcoming years. As we look at the growth in datacenters you will be able to see that Oracle is serious about the cloud strategy and transformation.


The full presentation deck can be found embedded below:


Wednesday, July 09, 2014

Room rates based upon big data

Hotels traditionally do have flexible rates for their rooms, the never ending challenges for hotels however is, when to raise your rates and when to drop the rates. A common seen solution is to raise prices as closer to the date the room will be occupied and one or two days in advance drop the price if it is not sold yet. On average this is working quite well however it is a suboptimal and unsophisticated way of introducing dynamic pricing for hotel rooms. The real value of a room is depending on large set of parameters that are constantly changing.

For example the weather, vacations, conventions in town or airlines that are on strike will all influence the demand for rooms. If you are able to react to changing variables directly you will be able to make the average hotel room more profitable. Keeping track of all kinds of information from a large number of sources and benchmark this against results from the past is a extremely difficult task to manually or even to code a customer application for. Duetto recently raised $21M in venture capital from Accel Partners to expand their SaaS solution which is providing exactly this service to hotels.

Duetto provides a SaaS solution from the cloud that will keep track of all potential interesting data sources and mines this data to dynamically change the room rates for a hotel based upon the results.


By mining and processing big data Duetto is able to advise hotels on when to drop or raise the price. This can change in a moment notice and without the hotel employees to keep track of all things that are happening in the area. Duetto provides all hotels an easy solution for implementing intelligent dynamic pricing. The big advantage Duetto is offering is that it is a SaaS solution that is ready to run from day one instead of building a home grown solution which might take a long period to develop, test and benchmark before it will become usable.

It is not unlikely that Duetto will be expanding their services in the near future to other industries. The demand for dynamic pricing based upon big data will only be a growing market in the upcoming years and Duetto will be in an ideal position to expand their services to new growth markets.

Monday, July 07, 2014

Using R and Oracle Exadata

Currently the R language is the choice for statistical computing and is widely adopted in the commercial and scientific community performing statistical computing. R is a free statistical language developed in 1993 and released under the GNU General public license. Traditionally R has been used to do statistical computations on large sets of data and due to this it is seeing an adoption in the Big Data ecosystem even though it is not as widely adopted as for example the MapReduce programming paradigm which has it high adoption rate thanks to Apache Hadoop.

Even though, R is claiming its place in the Big Data ecosystem and is seeing enterprise grade adoption. Due to this there are a number of enterprise ready R implementations available. Oracle is one of the companies who have developed enterprise ready R named “Oracle R Enterprise”. Interesting about the Oracle R Enterprise distribution is that it will become a part of the database server itself.

In general the idea is that on the database server multiple R engines will be spawned and will work in parallel to execute the computations needed. Depending on your programming the results can be stored in the database, can be given to a workstation or can be sending to a Hadoop cluster to execute additional computations. As an addition to this, due to the Hadoop connector R inside the database server can potentially also make use of data inside the Hadoop cluster if needed. From a high level perspective this will look like the diagram below.


Oracle is providing engineered systems for both Big Data, Analytics and the Oracle database. This means we can also deploy the above outlined scenario on an engineered systems deployment. In the below diagram we will use a pure Oracle Engineered Systems solution however this is not required, you can use Oracle engineered systems where you deem them needed and leave them out where you do not need them. However, there are large benefits when deploying a full engineered system solution. 

In the above example diagram the deployment is using Oracle Exadata, Oracle Big Data Appliances and the Oracle Exalytics machine. By combining those you will benefit from both R and from the capabilities of the Oracle Engineered systems. When you are in need of deploying R for analytical computing and you are also using Oracle databases and applications on a wider scale in your IT landscape it will be extreemly beneficial to give Oracle Enterprise R a consideration and depending on the size of your data to combine this with Oracle Engineered systems.