Monday, May 25, 2009

Evaluate with Date Functions

If you’re not concerned about database portability – for example, you use Oracle and that’s that, forever – then the evaluate function in OBIEE can be useful. However, using the evaluate function can be tricky, and the documentation could be better.

Here’s an example of using Evaluate with the Oracle next_day function, which returns the date of the next specified day of the week following any given date. This function is useful for calculating the week ending date for any given date. If your standard week ends on Saturday, then the expression you would use would be next_day(<date>, ‘saturday’).  This is quite simple, but how do you use it within OBIEE?

Here’s a simple subject area in OBIEE which I used to test how to use next_day( ). It contains two logical tables: a Days table and a Facts table. The facts table shows how many calls occurred on any day.

image

As these query results show, there is one call per day.

image

The objective was to sum up the number of calls for each Saturday-ending week using the Oracle next_day function.

When you use the evaluate function in OBIEE, the expression builder presents you with this template:

image 

Following the template, you would write this:

image

However, this syntax produced a “Union of non-compatible types” error.

image

I do not know what this error is really saying. Does the column name have to be enclosed in single quotes to make it a string?

image

The administration tool accepts this, but you get a SQL error when you use it in a query.

ORA-01858: a non-numeric character was found where a numeric was expected

The reason is obvious when you look at the SQL generated:

select next_day('"XE".""."XE"."DAYS"."DATE1"','saturday') as C1
,sum(T3392.CALLS) as C2
from DAYS T3386
,DAYSCALLS T3392
where (T3386.id = T3392.id)
group by next_day('"XE".""."XE"."DAYS"."DATE1"','saturday')
order by C1

Instead of providing a column identifier that had a date data type as an argument in the next_day function, the SQL contained a string value.

My next thought when I encountered this was to include the column name as part of the first “str_Expr”.

image

However, this produced another Oracle error:

ORA-01741: illegal zero-length identifier

So to fix this I edited the column identifier.

image

The administration tool accepted this, but it produced a different SQL error at run time:

ORA-00904: "DAYS"."DATE1": invalid identifier

The reason this is an invalid identifier is that the SQL OBIEE generated included a table alias for the DAYS table, T3392.

select next_day("DAYS"."DATE1",'saturday') as C1
,sum(T3392.CALLS) as C2
from DAYS T3386
,DAYSCALLS T3392
where (T3386.id = T3392.id)
group by next_day("DAYS"."DATE1",'saturday')
order by C1

To fix this, I deleted the DAYS identifier in the function. This might work, but not if you have a DATE1 column defined in more than one table in the FROM clause. This data contained a DATE1 column both in the DAYS table and DAYSCALLS table. Therefore, the query produced another Oracle error.

ORA-00918: column ambiguously defined

To fix this, a table identifier was needed. Since OBIEE used T3386, I included this in the evaluate function.

image

The administration tool accepted this, and OBIEE generated correct SQL that Oracle accepted.

select next_day("T3386"."DATE1",'saturday') as C1
,sum(T3392.CALLS) as C2
from DAYS T3386
,DAYSCALLS T3392
where (T3386.id = T3392.id)
group by next_day("T3386"."DATE1",'saturday')
order by C1

However, the results in Answers were not correct.

image

Notice that instead of getting a week ending date, we see something that looks like a month, with the sum by week presented as separate rows within the month. Even though the function next_day returns a date, OBIEE treated it as text.

image

To fix this, I added AS DATE in the formula to declare that the returned value was a date.

image

After I refreshed metadata on the presentation server, the query now produced the desired results.

image

Looking at column properties, everything seemed correct. The column was being treated as a date.

image

But look what happened – see the X axis -- when I graphed this.  The X-axis showed question marks instead of week ending dates.

image

After adding the graph, I examined the column properties.

image

The column had reverted to text!

To get OBIEE to recognize the results of the evaluate function as a date, one solution that seemed to work was to use the Oracle to_date function along with the next_day function.

imageNow the graph showed week ending dates on the x-axis.

image

Saturday, May 9, 2009

N:N Fact:Dimension

[Note: the following discussion is based on OBIEE version 10.1.3.4.0 and Oracle XE version 10G, both running on Windows. The behavior of other versions of OBIEE could be different.]

Typically dimensions have a 1:N relationship to facts. For example, consider a period dimension table at the day level and a fact table containing sales by day and product. For every row in the dimension table there can be many rows in the fact table. However, for every row in the fact table, there is only one associated row in the period table.

However, sometimes things are more complicated. In some cases, each row in the fact table could be associated with many rows in a dimension table. This posting is about modeling this kind of relationship in OBIEE.

Let’s take an example where this could occur. When a customer calls a call center, the customer could discuss many topics. If there is a “topic” dimension (a dimension that lists the topics that could be discussed), there could be be many topics associated with any given call.

For example, consider this Calls table. It contains data for 20 calls. 6  occurred in May, 6 in June, 5 in July, and 3 in August.  Each call was one minute long.

clip_image002

There are five topics in the Topics table, four specific topics and a catch-all “Other”.

clip_image004

One way to store which topics were discussed in which call is with the use of a table that associates topics with calls. Such a table can go by several names: “relationship table”, “associative table”,  “helper table”, “owner-member table”, or “bridge table” are some terms that have been used. Some of the rows in this table (let’s call it “CallsTopics”) could look like this:

clip_image006

This table tells us that call 1 was entirely about Bills. Call 2 was about the Bills and Other. Call 3 was about Instructions and Other. In our example database, this table has 36 rows.

The physical database schema in this data model looks like this:

clip_image008

This model could be more complex. In real life there could (and should) be a period dimension table associated with the calls. There could also be a customer dimensions, a call center employee dimension, and perhaps other dimensions. In our simple example, however, we will work with just these three tables.

The main facts are the count of calls and the duration of calls. The aggregation rule for “# Calls” is count distinct. The aggregation rule for Duration is sum.  Both of these facts come from the Calls table. In most schemas, the fact table is at the N end of all the join relationships that touch it. Notice that this schema is different: the CallTopics table is the one that is at the N end of the joins.

When modeling this in OBIEE, consider that there are two pathways to the Calls table. One logical path is from the Calls table itself used as a dimension table. The other is from the Topics table via the CallTopics table.

One way to model these two pathways is to create two logical fact table sources. One would contain the Calls table alone. The other would contain Calls and CallTopics, as shown in the following screen shot.

clip_image010

The entire business model looks like this.

clip_image012

By the way, the business model diagram shows the normal 1:N relationships between logical dimension tables and logical fact tables in that the logical fact table is at the N end of every logical join that touches it.

clip_image014

In the business model, the two fact table sources can be thought of as being at two different levels of aggregation. With two dimensions, Calls and Topics, the Calls source is at the Total level for Topics. 

clip_image016 

The other source, “CallsAndTopics”, is at the detail level for the two dimensions.

clip_image018  

A query for facts by topic using this business model produced these results:


clip_image020

The way to interpret the numbers in the Tot Duration column is that for each topic, this is the sum of the duration of calls that included that topic. In other words, 11 phone calls with a total duration of 11 minutes included the topic of billing. However, the Grand Total of 36 is a more questionable number. Shouldn’t the “Tot Duration” Grand Total be 20?

It turns out that the value shown for the Duration Grand Total depends on the method of aggregation that is being used. The aggregation rule is set in the Edit Column Formula dialog. To begin with, the default aggregation rule is inherited from the aggregation rule set for the column in the metadata. In this case, it is SUM.

clip_image022

Notice what happens when the Server Complex Aggregate rule is used instead.

clip_image024

The use of different aggregation rules resulted in different logical SQL, with REPORT_SUM used when the aggregation rule was “Default” (inherited from SUM) and AGGREGATE when “Server Complex Aggregate”.

clip_image026 

The different logical SQL statements resulted in different physical SQL statements. 


With REPORT_SUM there were two Select statements to the database (sum call duration and count distinct calls by topic in the first select; count distinct calls overall), with an additional six select statements issued to manipulate the results of the first two queries, including calculating the report sum. 


In the case of the AGGREGATE rule, an additional SQL statement was issued to sum the overall duration. This is the query that eliminated the over-counting. Then two additional select statements were used to manipulate the results of the first three queries.


Note that by changing the aggregation rule to server complex aggregate, then opening the column properties dialog and saving as the system-wide default, the AGGREGATE function will become the default aggregation rule in the future. (Re-read the warning about version-specific behavior at the beginning of this post.)

 
clip_image028 
 clip_image030

Another way to model this is to take the CALLTOPICS table out of the fact table source and create a new logical table that acts as a “bridge table” between the Topics table and the Calls table. Note that the Bridge table property is checked here.

clip_image032 

The business model now looks like this.

clip_image034 

The default results are shown here.

 clip_image036 

The logical SQL issued uses the AGGREGATE rule by default.

clip_image038 

Switching the aggregation rule in the column formula dialog to SUM changes the results and the logical SQL.

 clip_image040 
 clip_image042 

Summary: In data models such as this where you have the possibility of double counting results in totals, be aware that the totals will depend on the aggregation rules set for the measures on the Criteria tab.


Default behavior depends on how the business model is configured and which aggregation rule is being used as the default.