Sunday, August 24, 2008

Microsoft SQL Server OLAP Services

While cleaning my desk I found an interesting, however somewhat old, printout on Microsoft SQL Server OLAP Services and distinct counts in combination with basket analysis. The document is going into how to quickly build a basket analysis with distinct.

For example, suppose you have a cube that analyzes sales transactions. It has dimensions that describe customers (geography, education, income level, gender), products (classification, color, size), time, and the sales rep through the organizational structure. The measures include information about revenue, quantity, and discounts.

One of the most common questions would be, "How many customers bought a specific product?" An even better and more general question might be, "How many customers are buying each product?"

Although this last question seems simple, it is not. A regular COUNT measure will not provide correct results because double counts may occur. If a single customer buys a product more than once, a regular COUNT with the measure will count the product sale by customer twice. In order to get the correct results, each customer needs to be counted only once. This is the classic DISTINCT COUNT problem, and it requires a fairly complex resolution in the online analytical processing (OLAP) environment.

The problem may become even more interesting if the question becomes, "How many customers bought a specific basket of products?" Take the "Diapers & Beer" example, "How many customers bought both diapers and beer?" This type of question falls under the Basket Analysis problem category.

This document discusses the techniques to solve these two classic problems, DISTINCT COUNT and Basket Analysis. It assumes that the reader has a basic understanding of the concepts of OLAP in general, OLAP Services in particular, and MDX.

For the complete document you have to visit the Microsoft Technet site.

No comments: