Sunday, April 5, 2009

Fact-Based Partitioning

One of the questions that comes up over and over again is how to show all periods in a query when facts don’t exist for all periods or how to show all products when facts don’t exist for all products.

To illustrate the problem, I created a database of fictional frosting sales. This is a very simple star schema having a period table, a product table, and a fact table containing weekly sales data for the last eleven weeks of 2003.

clip_image002

clip_image004

One of the frosting products, the “Party” frosting, was only introduced during the week of 11/6, however. Its sales data look like this – with nothing for the weeks in October.

clip_image006

How do you construct the metadata to show zero sales in October for Party frosting?

This is often thought of as a relational database outer join problem, which brings up another question frequently asked: how do you specify an outer join in the metadata?

Quite often, people think of this as a property of the physical join between these tables. However, if you look at the physical join properties in the metadata, you can see that the dialog does not give you any way to specify that the join should be an outer join. Notice that the join type drop down control is not active – it is grayed out.

clip_image008

When we designed OBIEE, we put control over the join type in the business model diagram because it really is a property of the business model – that is, it’s a property of the relationship between the period dimension and the facts. No matter how many period table sources or fact table sources there might be at different levels of granularity (aggregation), the join relationship should be the same within the business model.

So to change the join type, use the business model diagram and change the join type between the logical tables. In the screen shot shown here, I have changed the join type to “Right Outer”. (“Right” is correct in this context since the table having the rows to preserve, the Weeks logical table, is on the right in this dialog).

clip_image010

If you examine the physical SQL after making the change, you can see the outer join being done, with the “right outer join” now translated into a “left outer join” in the SQL since the Weeks table is now “on the left” in the SQL.

SELECT T2382.DESCRIPTION AS c1,
T2388.WEEK AS c2,
sum(T2373.QS) AS c3
FROM
FROSTINGPRODUCTS T2382,
FROSTINGWEEKS T2388 left outer join
FROSTINGFACTS T2373 On T2373.PERIODKEY = T2388.PERIODKEY
WHERE
( T2373.PRODKEY = T2382.PRODKEY AND T2382.DESCRIPTION = 'Party' )
GROUP BY T2382.DESCRIPTION, T2388.WEEK
ORDER BY c1, c2

So does this solve the problem? If you look at the results, you can see that it doesn’t.

clip_image012

Perhaps you might be thinking that you might need to outer join the product table to the fact table, too. But this doesn’t give you the right results, either. Even with full outer joins, the results are still exactly the same.

This is the physical SQL generated by using full outer joins between the dimension tables and the fact table in the business model.

SELECT t2382.DESCRIPTION AS c1
,t2388.week AS c2
,SUM(t2373.qs) AS c3
FROM frostingproducts t2382
FULL OUTER JOIN (frostingweeks t2388
FULL OUTER JOIN frostingfacts t2373
ON t2373.periodkey = t2388.periodkey)
ON t2373.prodkey = t2382.prodkey
WHERE (t2382.DESCRIPTION = 'Party')
GROUP BY
t2382.DESCRIPTION
,t2388.week
ORDER BY c1,c2

It produces these results

clip_image014.

Let’s think about how to solve this problem in SQL. First calculate the sum of units sold by week for the ‘Party’ frosting. The SQL for that could look like this:

SELECT  
F.PRODKEY
,W.WEEK
,SUM(F.QS) UnitsSold
FROM
FROSTINGWEEKS W
INNER JOIN FROSTINGFACTS F
ON F.PERIODKEY = W.PERIODKEY
WHERE F.PRODKEY IN
(SELECT Prodkey
FROM FROSTINGPRODUCTS
WHERE Description = 'Party')
GROUP BY F.PRODKEY, W.WEEK


This SQL would produce the familiar results we have already seen.



clip_image016



For the next step, outer join the Products and Weeks table together without using the fact table.



Select P.Description
, P.Prodkey
, W.Week from
frostingproducts P
,frostingweeks W
where P.Description = 'Party'

clip_image018

Then left join the second result set to the first one. In terms of SQL, it would look like this:

WITH D1 AS
(select F.PRODKEY
,W.WEEK
,sum(F.QS) UnitsSold
FROM
FROSTINGWEEKS W inner join
FROSTINGFACTS F On F.PERIODKEY = W.PERIODKEY
WHERE F.PRODKEY IN (SELECT
Prodkey
FROM
FROSTINGPRODUCTS
WHERE
Description = 'Party')
GROUP BY F.PRODKEY, W.WEEK ORDER BY 2),
D2 AS
(Select P.Description, P.Prodkey, W.Week FROM
frostingproducts P
,frostingweeks W
WHERE P.Description = 'Party')
SELECT D2.Description, D2.Week, nvl(D1.UnitsSold,0) UnitsSold
FROM D2 left join D1
on D2.Prodkey = D1.Prodkey
AND D2.Week=D1.Week
ORDER BY 2

clip_image020

An OBIEE business model could be constructed to produce SQL like this. One way to do it would be to join the period and product tables into a single logical dimension table (outer joining the underlying physical tables in the logical table source), then left joining this dimension table to the logical fact table.

Here’s the physical join diagram, with the join between Products and Weeks being a complex join with the condition 1=1.



clip_image022



A single logical table source combines Products and Weeks.



clip_image024

The business model would contain just two logical tables and a single logical join. Note that it is a Left Outer join.

clip_image026

With this metadata, the logical SQL

SELECT Products.DESCRIPTION
, Weeks.WEEK
, FrostingFacts.UnitsSold
FROM Frosting2
WHERE Products.DESCRIPTION = 'Party'

produces this physical SQL

SELECT

T2534.DESCRIPTION AS c1
, T2539.WEEK AS c2
, sum(nvl(T2526.QS , 0)) AS c3
FROM
(FROSTINGPRODUCTS T2534 /* FrostingProducts2 */
full outer join
FROSTINGWEEKS T2539 /* FrostingWeeks2 */ On 1 = 1) left outer join
FROSTINGFACTS T2526 /* FrostingFacts2 */
On T2526.PERIODKEY = T2539.PERIODKEY
AND T2526.PRODKEY = T2534.PRODKEY
WHERE
( T2534.DESCRIPTION = 'Party' )
GROUP BY T2534.DESCRIPTION, T2539.WEEK
ORDER BY c1, c2

giving the desired results.

clip_image028

(Note on the logical SQL: the columns from the single logical table “FrostingProducts2” have been separated into two presentation layer folders, “Products” and “Weeks”).

However, I don’t recommend this as a solution as you may not always want these results, but now you’ve hard-wired them into the metadata.

A better approach is to take advantage of the concept of “fact-based partitioning”. Fact-based partitioning is a term meaning that facts are stored (“partitioned”) in separate physical tables. These tables are configured as separate logical fact table sources in the metadata. When facts from two fact table sources are included in a query, the BI Server generates SQL with two query blocks and full outer joins the result sets together -- just the behavior we want.

In this case, we need to preserve both the weeks and the products and need a fact that will be returned for all weeks and products specified by the query filter.

I use a one row table to hold this fact. If you’re using Oracle as a database, you can use the virtual table “Dual”. If you’re using SQL Server, you can create a Select statement without a FROM clause to do the same thing (or you can create a one-row table in the database).

Step 1: In the physical layer create a source for the fact that will always be returned. I’ve used a Select statement to create this table:

clip_image030

Step 2: Join this table to the dimension tables with a complex join having the condition 1=1.

Step 3: Add this fact (renamed here as “PreserveDimensions”) as a new logical fact sourced from a new logical table source.


clip_image032

The fact has an aggregation rule of Max. Therefore it will always equal 1.

Step 4: Add this fact to the presentation layer.

Step 5: Create a query with a filter PreserveDimensions = 1 and save this filter. Call it “Preserve Dimensions = True”.



clip_image034



Now to preserve dimensions the user just needs to add this filter to the query. Here’s the query with just a filter on product description:



clip_image036



Now here’s the same query with the Preserve Dimensions filter added:



clip_image038

This query produces physical SQL that has the same form we looked at earlier. It has a query block calculating UnitsSold by Product Description and Week. It has a query block joining the Product Table and the Dimension Table, qualifying the full outer join where Description = ‘Party’.

SELECT
max(T2616.One) AS c1
, T2631.DESCRIPTION AS c2
, T2637.WEEK AS c3
FROM
FROSTINGWEEKS T2637 /* FrostingWeeksFBP */
, FROSTINGPRODUCTS T2631 /* FrostingProductsFBP */
, (SELECT 1 One FROM Dual) T2616
WHERE ( T2631.DESCRIPTION = 'Party' )
GROUP BY
T2631.DESCRIPTION
, T2637.WEEK
HAVING max(T2616.One) = 1

The SQL then outer joins the two result sets together, thereby preserving all the dimension values specified in the query filter.

Note: to display zeros for the weeks where values are NULL, I created a logical column using the ifnull function. In the database, the actual physical column is called QS (i.e. quantity sold). The logical column QS (i.e. Quantity Sold) is mapped directly to this physical column. The logical column UnitsSold is based on the logical column QS.

clip_image040

clip_image042

Now the Answers user can choose whether to display NULLs as zeros or not by selecting the appropriate measure, as well as whether to preserve dimensions or not by clicking on the saved filter.

clip_image044

An alternative to adding the ifnull function in the metadata would be to use it in the column formula on the Criteria tab in Answers.

image

Note: on versions of OBIEE that included the Time Series functions Ago and ToDate prior to 10.1.3.3.2, this solution would have produced an error since complex joins from a period table source were not allowed. For these versions of OBIEE, the technique described here would have to be modified slightly by altering the Weeks table . Add a column to the Weeks table where all rows have the value of 1. Then join this “key” to the Dual table with a Key/Foreign Key physical join.