kpi partners: Fact-Based Partitioning

One of the questions that comes up over and over again is how to show all periods in a query when facts don’t exist for all periods or how to show all products when facts don’t exist for all products.

To illustrate the problem, I created a database of fictional frosting sales. This is a very simple star schema having a period table, a product table, and a fact table containing weekly sales data for the last eleven weeks of 2003.

One of the frosting products, the “Party” frosting, was only introduced during the week of 11/6, however. Its sales data look like this – with nothing for the weeks in October.

How do you construct the metadata to show zero sales in October for Party frosting?

This is often thought of as a relational database outer join problem, which brings up another question frequently asked: how do you specify an outer join in the metadata?

Quite often, people think of this as a property of the physical join between these tables. However, if you look at the physical join properties in the metadata, you can see that the dialog does not give you any way to specify that the join should be an outer join. Notice that the join type drop down control is not active – it is grayed out.

When we designed OBIEE, we put control over the join type in the business model diagram because it really is a property of the business model – that is, it’s a property of the relationship between the period dimension and the facts. No matter how many period table sources or fact table sources there might be at different levels of granularity (aggregation), the join relationship should be the same within the business model.

So to change the join type, use the business model diagram and change the join type between the logical tables. In the screen shot shown here, I have changed the join type to “Right Outer”. (“Right” is correct in this context since the table having the rows to preserve, the Weeks logical table, is on the right in this dialog).

If you examine the physical SQL after making the change, you can see the outer join being done, with the “right outer join” now translated into a “left outer join” in the SQL since the Weeks table is now “on the left” in the SQL.

SELECT T2382.DESCRIPTION AS c1,
T2388.WEEK AS c2,
sum(T2373.QS) AS c3
FROM
FROSTINGPRODUCTS T2382,
FROSTINGWEEKS T2388 left outer join
FROSTINGFACTS T2373 On T2373.PERIODKEY = T2388.PERIODKEY
WHERE
( T2373.PRODKEY = T2382.PRODKEY AND T2382.DESCRIPTION = 'Party' )
GROUP BY T2382.DESCRIPTION, T2388.WEEK
ORDER BY c1, c2

So does this solve the problem? If you look at the results, you can see that it doesn’t.

Perhaps you might be thinking that you might need to outer join the product table to the fact table, too. But this doesn’t give you the right results, either. Even with full outer joins, the results are still exactly the same.

This is the physical SQL generated by using full outer joins between the dimension tables and the fact table in the business model.

SELECT t2382.DESCRIPTION AS c1
,t2388.week AS c2
,SUM(t2373.qs) AS c3
FROM frostingproducts t2382
FULL OUTER JOIN (frostingweeks t2388
FULL OUTER JOIN frostingfacts t2373
ON t2373.periodkey = t2388.periodkey)
ON t2373.prodkey = t2382.prodkey
WHERE (t2382.DESCRIPTION = 'Party')
GROUP BY
t2382.DESCRIPTION
,t2388.week
ORDER BY c1,c2

It produces these results

Let’s think about how to solve this problem in SQL. First calculate the sum of units sold by week for the ‘Party’ frosting. The SQL for that could look like this:

SELECT  
F.PRODKEY
,W.WEEK
,SUM(F.QS) UnitsSold
FROM    
FROSTINGWEEKS W
INNER JOIN FROSTINGFACTS F
ON F.PERIODKEY = W.PERIODKEY
WHERE    F.PRODKEY IN
(SELECT Prodkey
FROM   FROSTINGPRODUCTS
WHERE  Description = 'Party')
GROUP BY F.PRODKEY, W.WEEK

This SQL would produce the familiar results we have already seen.

For the next step, outer join the Products and Weeks table together without using the fact table.

Select P.Description
, P.Prodkey
, W.Week from
frostingproducts P
,frostingweeks W
where P.Description = 'Party'

Then left join the second result set to the first one. In terms of SQL, it would look like this:

WITH D1 AS
(select F.PRODKEY
,W.WEEK
,sum(F.QS) UnitsSold
FROM
FROSTINGWEEKS W inner join
FROSTINGFACTS F On F.PERIODKEY = W.PERIODKEY
WHERE F.PRODKEY IN (SELECT
Prodkey
FROM
FROSTINGPRODUCTS
WHERE
Description = 'Party')
GROUP BY F.PRODKEY, W.WEEK ORDER BY 2),
D2 AS
(Select P.Description, P.Prodkey, W.Week FROM
frostingproducts P
,frostingweeks W
WHERE P.Description = 'Party')
SELECT D2.Description, D2.Week, nvl(D1.UnitsSold,0) UnitsSold
FROM D2 left join D1
on D2.Prodkey = D1.Prodkey
AND D2.Week=D1.Week
ORDER BY 2

An OBIEE business model could be constructed to produce SQL like this. One way to do it would be to join the period and product tables into a single logical dimension table (outer joining the underlying physical tables in the logical table source), then left joining this dimension table to the logical fact table.

Here’s the physical join diagram, with the join between Products and Weeks being a complex join with the condition 1=1.

A single logical table source combines Products and Weeks.

The business model would contain just two logical tables and a single logical join. Note that it is a Left Outer join.

With this metadata, the logical SQL

SELECT Products.DESCRIPTION
, Weeks.WEEK
, FrostingFacts.UnitsSold
FROM Frosting2
WHERE Products.DESCRIPTION = 'Party'

produces this physical SQL

SELECT

T2534.DESCRIPTION AS c1
, T2539.WEEK AS c2
, sum(nvl(T2526.QS , 0)) AS c3
FROM
(FROSTINGPRODUCTS T2534 /* FrostingProducts2 */
full outer join
FROSTINGWEEKS T2539 /* FrostingWeeks2 */ On 1 = 1) left outer join
FROSTINGFACTS T2526 /* FrostingFacts2 */
On T2526.PERIODKEY = T2539.PERIODKEY
AND T2526.PRODKEY = T2534.PRODKEY
WHERE
( T2534.DESCRIPTION = 'Party' )
GROUP BY T2534.DESCRIPTION, T2539.WEEK
ORDER BY c1, c2

giving the desired results.

(Note on the logical SQL: the columns from the single logical table “FrostingProducts2” have been separated into two presentation layer folders, “Products” and “Weeks”).

However, I don’t recommend this as a solution as you may not always want these results, but now you’ve hard-wired them into the metadata.

A better approach is to take advantage of the concept of “fact-based partitioning”. Fact-based partitioning is a term meaning that facts are stored (“partitioned”) in separate physical tables. These tables are configured as separate logical fact table sources in the metadata. When facts from two fact table sources are included in a query, the BI Server generates SQL with two query blocks and full outer joins the result sets together -- just the behavior we want.

In this case, we need to preserve both the weeks and the products and need a fact that will be returned for all weeks and products specified by the query filter.

I use a one row table to hold this fact. If you’re using Oracle as a database, you can use the virtual table “Dual”. If you’re using SQL Server, you can create a Select statement without a FROM clause to do the same thing (or you can create a one-row table in the database).

Step 1: In the physical layer create a source for the fact that will always be returned. I’ve used a Select statement to create this table:

Step 2: Join this table to the dimension tables with a complex join having the condition 1=1.

Step 3: Add this fact (renamed here as “PreserveDimensions”) as a new logical fact sourced from a new logical table source.

The fact has an aggregation rule of Max. Therefore it will always equal 1.

Step 4: Add this fact to the presentation layer.

Step 5: Create a query with a filter PreserveDimensions = 1 and save this filter. Call it “Preserve Dimensions = True”.

Now to preserve dimensions the user just needs to add this filter to the query. Here’s the query with just a filter on product description:

Now here’s the same query with the Preserve Dimensions filter added:

This query produces physical SQL that has the same form we looked at earlier. It has a query block calculating UnitsSold by Product Description and Week. It has a query block joining the Product Table and the Dimension Table, qualifying the full outer join where Description = ‘Party’.

SELECT
max(T2616.One) AS c1
, T2631.DESCRIPTION AS c2
, T2637.WEEK AS c3
FROM
FROSTINGWEEKS T2637 /* FrostingWeeksFBP */
, FROSTINGPRODUCTS T2631 /* FrostingProductsFBP */
, (SELECT 1 One FROM Dual) T2616
WHERE ( T2631.DESCRIPTION = 'Party' )
GROUP BY
T2631.DESCRIPTION
, T2637.WEEK
HAVING max(T2616.One) = 1

The SQL then outer joins the two result sets together, thereby preserving all the dimension values specified in the query filter.

Note: to display zeros for the weeks where values are NULL, I created a logical column using the ifnull function. In the database, the actual physical column is called QS (i.e. quantity sold). The logical column QS (i.e. Quantity Sold) is mapped directly to this physical column. The logical column UnitsSold is based on the logical column QS.

Now the Answers user can choose whether to display NULLs as zeros or not by selecting the appropriate measure, as well as whether to preserve dimensions or not by clicking on the saved filter.

An alternative to adding the ifnull function in the metadata would be to use it in the column formula on the Criteria tab in Answers.

Note: on versions of OBIEE that included the Time Series functions Ago and ToDate prior to 10.1.3.3.2, this solution would have produced an error since complex joins from a period table source were not allowed. For these versions of OBIEE, the technique described here would have to be modified slightly by altering the Weeks table . Add a column to the Weeks table where all rows have the value of 1. Then join this “key” to the Dual table with a Key/Foreign Key physical join.

Sunday, April 5, 2009

Fact-Based Partitioning

Welcome

Blog Archive