Monday, October 1, 2007

SQL Model Clause Tutorial, part one

If you prefer to read in French, then you can find a translated version of this tutorial here, and if you prefer to read in German, then you can find a translated version here.


With the SQL model clause you build one or more matrixes with a variable number of dimensions. This is called the model. The model uses a subset of the available columns from your FROM clause. It contains at least one dimension, at least one measure and optionally one or more partitions. You can think of a model as a spreadsheet file containing separate worksheets for each calculated value (measures). A worksheet has a X- and an Y-axis (two dimensions) and you can imagine having your worksheets split up in several identical areas, each for a different country or department (partition).

The next figure shows a model of the well known EMP table, where deptno is a partition, empno a dimension and sal and comm are two measures:



Once your model is setup, you define rules that modify your measure values. These rules are the core of the model clause. With few rules you are able to make complex calculations on your data and you are able to create new rows as well. The measure columns are now arrays that are indexed by the dimension columns, where the rules are applied to all partitions of this array. After all rules are applied, the model is converted back to traditional rows.

My experience with the model clause wire diagram in the Oracle documentation is that it is quite complex and it tends to scare off people. That’s not what I’d like to achieve here, so I’ll use a different approach using lots of small examples using the EMP and DEPT table, starting with very simple ones, and gradually expanding on that. At the end of this piece, you’ll find a script you can download and run on your own database.

 SQL> select empno
2 , ename
3 , sal
4 from emp
5 where deptno = 10
6 /

EMPNO ENAME SAL
---------- ---------- ----------
7782 CLARK 2450
7839 KING 5000
7934 MILLER 1300

3 rijen zijn geselecteerd.


These are the known contents of table EMP for department 10. The equivalent of this SQL statement using a model clause, that does nothing:

SQL> select empno
2 , ename
3 , sal
4 from emp
5 where deptno = 10
6 model
7 dimension by (empno)
8 measures (ename, sal)
9 rules
10 ()
11 /

EMPNO ENAME SAL
---------- ---------- ----------
7782 CLARK 2450
7839 KING 5000
7934 MILLER 1300

3 rijen zijn geselecteerd.


Here we have two measures, ename and sal, and one dimension being empno. The combination of all partitions and dimension columns have to be unique. This is checked at runtime, and if violated, you’ll get an ORA-32638. Because of the two measures, two one dimensional arrays are internally created, which are index by empno. At line 9 you see the keyword “RULES”, but without contents yet. This is the place where the rules will be defined doing the calculations and the creations of new rows. The keyword “RULES” is optional, but for sake of clarity I will always write it down. When done modeling, all partitions, dimensions and measures are converted back to columns in traditional rows, which means that you can only have columns in your select list that appear in your model. If, for instance, I did not include the column ename as a measure, I would get this error message

 SQL> select empno
2 , ename
3 , sal
4 from emp
5 where deptno = 10
6 model
7 dimension by (empno)
8 measures (sal)
9 rules
10 ()
11 /
, ename
*
FOUT in regel 2:
.ORA-32614: ongeldige uitdrukking MODEL SELECT


In the next example I create a new row:

 SQL> select empno
2 , ename
3 , sal
4 from emp
5 where deptno = 10
6 model
7 dimension by (empno)
8 measures (ename,sal)
9 rules
10 ( ename[7777] = 'VAN WIJK'
11 )
12 /

EMPNO ENAME SAL
---------- ---------- ----------
7782 CLARK 2450
7839 KING 5000
7934 MILLER 1300
7777 VAN WIJK

4 rijen zijn geselecteerd.


The rule at line number 10 shows how measure ename is expanded with dimension 7777 and gets the value ‘VAN WIJK’. If table EMP would have contained an empno 7777 already, then the existing ename value for empno would have been overwritten by this rule. But 7777 does not exist in EMP, so now a new cell has been created, which shows itself as a new row in the result set. Please note that the row is not being inserted into the table, but only to the result set of the query. By adding a second rule you can fill the sal column as well:

 SQL> select empno
2 , ename
3 , sal
4 from emp
5 where deptno = 10
6 model
7 dimension by (empno)
8 measures (ename,sal)
9 rules
10 ( ename[7777] = 'VAN WIJK'
11 , sal[7777] = 2500
12 )
13 /

EMPNO ENAME SAL
---------- ---------- ----------
7782 CLARK 2450
7839 KING 5000
7934 MILLER 1300
7777 VAN WIJK 2500

4 rijen zijn geselecteerd.


The query returns both the existing rows as the new rows. With the keywords “RETURN UPDATED ROWS” you have the possibility to return only the new and updated ones:

 SQL> select empno
2 , ename
3 , sal
4 from emp
5 where deptno = 10
6 model
7 return updated rows
8 dimension by (empno)
9 measures (ename,sal)
10 rules
11 ( ename[7777] = 'VAN WIJK'
12 , sal[7777] = 2500
13 )
14 /

EMPNO ENAME SAL
---------- ---------- ----------
7777 VAN WIJK 2500

1 rij is geselecteerd.


All calculations are being performed on each partition. We can see this happening when we drop the filter predicate “deptno = 10” and show the deptno column instead. Now the question becomes whether to include the deptno column as a partition or as a measure. First, let’s explore what happens if we define deptno as a measure:

 SQL> select empno
2 , ename
3 , sal
4 , deptno
5 from emp
6 model
7 return updated rows
8 dimension by (empno)
9 measures (ename,sal,deptno)
10 rules
11 ( ename[7777] = 'VAN WIJK'
12 )
13 /

EMPNO ENAME SAL DEPTNO
---------- ---------- ---------- ----------
7777 VAN WIJK

1 rij is geselecteerd.


One row is created, as expected, with a NULL deptno value. Now let’s explore what happens if we define deptno as a partition:

 SQL> select empno
2 , ename
3 , sal
4 , deptno
5 from emp
6 model
7 return updated rows
8 partition by (deptno)
9 dimension by (empno)
10 measures (ename,sal)
11 rules
12 ( ename[7777] = 'VAN WIJK'
13 )
14 /

EMPNO ENAME SAL DEPTNO
---------- ---------- ---------- ----------
7777 VAN WIJK 30
7777 VAN WIJK 20
7777 VAN WIJK 10

3 rijen zijn geselecteerd.


A clear difference. In table EMP only the deptno values 10, 20 and 30 occur and therefore this model effectively has three partitions. The rule is applied to all three partitions, thus resulting in the addition of three new rows.

So far the examples will probably not have persuaded you to use the model clause. A lot of extra lines of SQL code were introduced in order to get something done which is quite simple: with the creative use the UNION ALL set operator and table DUAL we could have achieved the same results. The intention of this first part is only to show the basics. In the next part I will show multi cell references, references models and iterations. And this is where things become more interesting.

SQL script with all statements used

5 comments:

  1. Hi, Rob. AFAIU you finally got some free time and began to translate your dutch article on Model clause into english.
    At least the first part is so easy that it can be used as a school-book for those who have difficulties with understanding the description in the manuals or who are lazy to read it (manual I mean) :))
    Hope to see next parts soon.

    ReplyDelete
  2. Rob,
    Finally you got some time to have English version of MODEL article.
    Thank you on behalf of every one.
    Devang

    http://forums.oracle.com/forums/thread.jspa?messageID=1850848�

    http://forums.oracle.com/forums/thread.jspa?messageID=1864580�
    :-)

    ReplyDelete
  3. Thanks for your blog.
    Very well explained :)
    Helped clear a few things up for me.

    ReplyDelete
  4. Hi Rob,
    Thanks a lot for this useful article! I've just set something so straight clear now, regarding partitions vs dimensions. :)

    Thanks a lot,
    Mihai

    ReplyDelete