Saturday, August 8, 2009

When Fact Tables Do Not Join to All Dimension Tables

When I read the questions people ask about data modeling on OBIEE forums, one that seems to come up frequently is what to do when you have fact tables that do not join to all dimension tables.
This picture illustrates the problem. FactTable2 joins to two dimension tables, but FactTable1 only joins to only one.
image
The data for this example is very simple:
image
Here’s what typically happens when the naive metadata designer is finished. The first query results look good.
image
But when the query contains a column from the second dimension table, suddenly the data that was there before disappears, and the first reaction is that something is wrong.
image
From one perspective, these are correct result. The query asked for Fact1 aggregated by DimA and DimX. Since Fact1 does not join to Dim X, nulls are the correct answer.
Looking at the SQL generated gives you an idea of how OBIEE navigates this query. It determines that the query wants Fact1 aggregated by DimADesc and DimXDesc. The only fact table that can aggregate by those two dimensions is Fact2. Therefore, the SQL it generates uses FactTable2 in the FROM clause, not FactTable1, even though Fact1 does not map to any column in FactTable2. The BI Server is aware Fact1 does not map to FactTable2, so it returns Null (C3 in the outer query block below) as the value of Fact1.
select distinct D1.C1 as C1,
D1.C2 as C2,
cast(null as double precision) as C3
from
(select distinct T1416.DIMADESC as C1,
T1420.DIMXDESC as C2
from DIMX T1420,
DIMA T1416,
FACTTABLE2 T1429
where (T1416.DIMAKEY = T1429.DIMAFKEY
and T1420.DIMXKEY = T1429.DIMXFKEY)
) D1
order by C1, C2

However, what if Fact1 is a level-based measure, always calculated at the Grand Total level for DimX? Then the BI Server knows it does not have to aggregate by the attribute values of DimX and returns these results.
image
In the SQL you can see that there is still a query block involving FactTable2. This query determines the values of  DimADesc and DimXDesc that will be in the results. The query to FactTable1 aggregates Fact1 by DimADesc. The two result sets are then fully outerjoined – even null values, if they are returned, will be joined, Null to Null.
WITH SAWITH0 AS 
(SELECT DISTINCT 
 T1420.DIMXDESC AS C1,
 T1416.DIMADESC AS C2
 FROM   DIMX T1420,
 DIMA T1416,
 FACTTABLE2 T1429
 WHERE  (T1416.DIMAKEY = T1429.DIMAFKEY
 AND T1420.DIMXKEY = T1429.DIMXFKEY)),
SAWITH1 AS 
(SELECT   
sum(T1424.FACT1) AS C1,
T1416.DIMADESC   AS C2
FROM     
DIMA T1416,
FACTTABLE1 T1424
WHERE    (T1416.DIMAKEY = T1424.DIMAFKEY)
GROUP BY T1416.DIMADESC)
SELECT   DISTINCT
CASE 
   WHEN SAWITH1.C2 IS NOT NULL THEN SAWITH1.C2
   WHEN SAWITH0.C2 IS NOT NULL THEN SAWITH0.C2
END AS C1,
SAWITH0.C1 AS C2,
SAWITH1.C1 AS C3
FROM  
SAWITH0
FULL OUTER JOIN SAWITH1
ON nvl(SAWITH0.C2,'q') = nvl(SAWITH1.C2,'q')
AND nvl(SAWITH0.C2,'z') = nvl(SAWITH1.C2,'z')
ORDER BY C1, C2


As the physical query shows, results will be determined by the foreign keys of DimX in FactTable2. If the rows where DimXFKey=2 are deleted, then the result rows where DimXDesc = Y will drop out.


image 


In this Business Model, both fact table sources (for FactTable1 and FactTable2) have an aggregation content of Detail for both dimensions (Detail is the default level when all logical levels are null in the logical table source). However, FactTable1 will not suffice as a source since it does not physically join to DimX.


image image 


Changing the aggregation content for FactTable1 does not alter the results or the SQL generated. (Note here that when one dimension has a level that is specified and the other dimension(s) are unspecified, the meaning the BI Server ascribes to unspecified is Grand Total).


image 








An alternate approach is to physically join DimX to FactTable1 using a complex join having the join condition 1=1. The aggregation content for both logical fact table sources can be set at Detail so that any query containing columns from DimX can use FactTable1 as a source. The measure, Fact1, no longer has to be set to Grand Total level for DimX. Since FactTable2 will not be involved in the query, the foreign key values in FactTable2 will not matter. Now the results are back to what we saw in the first query.


image


However, the SQL is quite different, and you can see why all the values of DimXDesc are returned.


select T1416.DIMADESC as c1,
     T1420.DIMXDESC as c2,
     sum(T1659.FACT1) as c3
from 
     DIMX T1420,
     DIMA T1416,
     FACTTABLE1 T1659 
where  ( T1416.DIMAKEY = T1659.DIMAFKEY ) 
group by T1416.DIMADESC, T1420.DIMXDESC
order by c1, c2


The important points to remember are 
that a physical fact table that does not join 
to a dimension table can be made to join with
 a complex join having the condition 1=1. 
The logical fact table source that contains this 
physical fact table can be set at the Detail level 
for the dimension that is joined like this. When it is possible for the user to select more than one value from the dimension that has a 1=1 join, the facts should be set at the grand total for that dimension.