Why you should add MDX to your SQL Server toolkit
Takeaway: The MDX language, along with SQL Server Analysis Services, can provide an efficient mechanism for summarizing and exploring large volumes of data. Get an overview of this powerful data-mining tool and find out how it can help you query large databases.
By Mike Gunderloy
Even if you can’t know everything about the hundreds of languages and thousands of tools that might come in handy on future projects, you can know enough to help you choose tools when you need them. Take the Multi-Dimensional Expression (MDX) language supported by Microsoft SQL Server Analysis Services. I can’t explain the entire language to you in a single article, but I can give you a broad outline so you’ll remember where to look when you need to know more.
OLTP vs. OLAP
It’s important to understand the distinction between two classes of database. A relational database such as SQL Server is sometimes called an online transaction processing (OLTP) database. Such databases are optimized for storing and retrieving single pieces of information. For example, you might want to know the sales of Colony Bagels at your company’s store in Corvallis, OR, in July 1997. To retrieve this information, you’d write a SQL query that might look something like this:
SELECT SUM(Sales.[Unit Sales])
FROM (Sales INNER JOIN Stores
ON Sales.StoreID = Stores.StoreID)
INNER JOIN Products
ON Sales.ProductID = Products.ProductID
WHERE Stores.StoreCity = 'Corvallis'
AND Products.ProductName = 'Colony Bagels'
AND Sales.SaleDate BETWEEN '07-01-1997' AND '07-31-1997'
Such queries are excellent for focusing on specific pieces of information. But where OLTP databases and SQL queries fall down is in spotting patterns in large amounts of data. Suppose you wanted to know how different product groups (e.g., bakery, deli, and produce) did at stores in one state compared to those in another state? Yes, you can write a SQL query to get that information, but because it has to chug through all of the source data, it will tend to be slow, and the joins and unions quickly get unwieldy.
That’s where online analytical processing (OLAP) comes into the picture. OLAP databases draw their source data from OLTP databases (or other large data sets), but they store it in a multidimensional summarized form. Without going into the low-level implementation details, this means that an OLAP database might precompute totals by state and by product.
The different types of summarized information are called dimensions, and the data structure that summarizes a large amount of data by multiple dimensions is called a cube. The summarized information (e.g., unit sales or net profit) is called a measure. Any cube will have one or more dimensions and one or more measures.
It takes time and computing horsepower to create a cube, which is typically done as a batch process at regular intervals. The advantage to this scheme is that after the cube has been created, it can be browsed without touching the original data.
Figure A shows a view of the Sales cube that ships as a sample with Microsoft SQL Server Analysis Services, which is the OLAP component of SQL Server.
Getting to MDX
That’s where the MDX language comes in. MDX is syntactically similar to SQL, but it offers far better capabilities for dealing with the multidimensional data stored in an OLAP cube. By far the most common task you’ll perform with MDX is the retrieval of information from a cube. After all, the whole point of an OLAP database is to make it easy to get summarized information. The tool for this is the MDX SELECT statement, which has a different structure from the SQL SELECT statement.
Schematically, an MDX SELECT looks like this:
SELECT <axis> [, <axis> …]
FROM <cube>
WHERE <slicer>
The axis clauses specify which summary information you’re interested in; normally these will be dimensions or parts of dimensions. The cube clause specifies the OLAP cube that contains the data. The slicer clause specifies the data that is relevant to your interest; often, this will be a measure. For example, remember the problem of looking at product groups and states? Here’s an MDX query to provide the required information:
SELECT
[Store].[Store Country].[USA].Children ON COLUMNS,
[Product].[All Products].[Food].Children ON ROWS
FROM [Sales]
WHERE ([Measures].[Unit Sales])
In this case, there are two axis specifications, one of which is to be the columns of the output and one of which is to be the rows. Each one specifies part of a dimension—for example, the COLUMNS axis will display all direct children of the U.S.A. member of the Store Country level of the Store dimension. Figure B shows the results of running this MDX query in the MDX Sample application that ships with SQL Server Analysis Services.
Looking at the results, you can start to see patterns in the data. For example, eggs do better proportionately in Oregon than produce does.
The depths of MDX
There’s much more to MDX than I can show you in a single query, of course. Here’s a short list of some of the advanced tasks that you can perform with MDX:
- Return up to 128 dimensions in a single result set (though you may have trouble finding a tool to display that many dimensions!)
- Collapse multiple dimensions on to a single axis for grouped summaries
- Find highest or lowest values
- Flexibly order results
- Calculate new members by combining existing measures
- Retrieve properties of individual measures
- Format results
- Compare different time periods
- Define custom functions in external libraries
The goal of MDX is simple: It works together with Analysis Services to provide an efficient mechanism for summarizing and exploring large volumes of data. Any time the requirements of a job include mining patterns and trends out of a large database, you should consider MDX as one potential part of your solution.
SponsoredWhite Papers, Webcasts, and Downloads
- Advances in Data Warehouse Performance: I/O Elimination in DB2 IBM
- Microsoft SQL Server 2005: Deployment and Tests in an iSCSI SAN Dell EqualLogic
- IBM Balanced Warehouse - The Flexible Foundation for Real Time Business Intelligence IBM
- Live Webcast: Top Ten Challenges with On-Premise Email Management Dell MessageOne
- Case Study: GHS Data Management - Improving Data Protection and Storage Reliability for Critical Databases Dell EqualLogic
Article Categories
- Security
- Security Solutions, IT Locksmith
- Networking and Communications
- E-mail Administration NetNote, Cisco Routers and Switches
- CIO and IT Management
- Project Management, CIO Issues, Strategies that Scale
- Desktops, Laptops & OS
- Windows 2000 Professional, Microsoft Word, Microsoft Excel, Microsoft Access, Windows XP,
- Data Management
- Oracle, SQL Server
- Servers
- Windows NT, Linux NetNote, Windows Server 2003
- Career Development
- Geek Trivia
- Software/Web Development
- Web Development Zone, Visual Basic, .NET





