Monday, October 29, 2007

SQL Model Clause tutorial, part two

SQL Model Clause Tutorial, part one
SQL Model Clause Tutorial, part three

If you prefer to read in French, then you can find a translated version of this tutorial here, and if you prefer to read in German, then you can find a translated version here.

Multi-cell references

It is possible to address multiple cells with a single rule. This is done by doing a so-called multi-cell reference. To show this I will introduce a new demonstration table with a composite primary key to be able to work with multiple dimensions. The table holds monthly sales numbers of two books in 2005 and 2006 and is created like this:

SQL> create table sales
  2  as
  3  select 'The Da Vinci Code' book, date '2005-03-01' month, 'Netherlands' country, 5 amount from dual union all
  4  select 'The Da Vinci Code', date '2005-04-01', 'Netherlands', 8 from dual union all
  5  select 'The Da Vinci Code', date '2005-05-01', 'Netherlands', 3 from dual union all
  6  select 'The Da Vinci Code', date '2005-07-01', 'Netherlands', 2 from dual union all
  7  select 'The Da Vinci Code', date '2005-10-01', 'Netherlands', 1 from dual union all
  8  select 'The Da Vinci Code', date '2005-02-01', 'United Kingdom', 15 from dual union all
  9  select 'The Da Vinci Code', date '2005-03-01', 'United Kingdom', 33 from dual union all
 10  select 'The Da Vinci Code', date '2005-04-01', 'United Kingdom', 47 from dual union all
 11  select 'The Da Vinci Code', date '2005-05-01', 'United Kingdom', 44 from dual union all
 12  select 'The Da Vinci Code', date '2005-06-01', 'United Kingdom', 11 from dual union all
 13  select 'The Da Vinci Code', date '2005-08-01', 'United Kingdom', 2 from dual union all
 14  select 'The Da Vinci Code', date '2005-05-01', 'France', 2 from dual union all
 15  select 'The Da Vinci Code', date '2005-08-01', 'France', 3 from dual union all
 16  select 'The Da Vinci Code', date '2006-01-01', 'France', 4 from dual union all
 17  select 'Bosatlas', date '2005-01-01', 'Netherlands', 102 from dual union all
 18  select 'Bosatlas', date '2005-02-01', 'Netherlands', 55 from dual union all
 19  select 'Bosatlas', date '2005-03-01', 'Netherlands', 68 from dual union all
 20  select 'Bosatlas', date '2005-04-01', 'Netherlands', 42 from dual union all
 21  select 'Bosatlas', date '2005-05-01', 'Netherlands', 87 from dual union all
 22  select 'Bosatlas', date '2005-06-01', 'Netherlands', 40 from dual union all
 23  select 'Bosatlas', date '2005-07-01', 'Netherlands', 31 from dual union all
 24  select 'Bosatlas', date '2005-08-01', 'Netherlands', 26 from dual union all
 25  select 'Bosatlas', date '2005-09-01', 'Netherlands', 22 from dual union all
 26  select 'Bosatlas', date '2005-10-01', 'Netherlands', 23 from dual union all
 27  select 'Bosatlas', date '2005-11-01', 'Netherlands', 88 from dual union all
 28  select 'Bosatlas', date '2005-12-01', 'Netherlands', 143 from dual union all
 29  select 'Bosatlas', date '2006-01-01', 'Netherlands', 31 from dual union all
 30  select 'Bosatlas', date '2006-02-01', 'Netherlands', 18 from dual union all
 31  select 'Bosatlas', date '2006-03-01', 'Netherlands', 15 from dual union all
 32  select 'Bosatlas', date '2006-04-01', 'Netherlands', 11 from dual union all
 33  select 'Bosatlas', date '2006-05-01', 'Netherlands', 17 from dual union all
 34  select 'Bosatlas', date '2006-06-01', 'Netherlands', 9 from dual union all
 35  select 'Bosatlas', date '2006-07-01', 'Netherlands', 12 from dual union all
 36  select 'Bosatlas', date '2006-08-01', 'Netherlands', 20 from dual union all
 37  select 'Bosatlas', date '2006-09-01', 'Netherlands', 4 from dual union all
 38  select 'Bosatlas', date '2006-10-01', 'Netherlands', 5 from dual union all
 39  select 'Bosatlas', date '2006-11-01', 'Netherlands', 1 from dual union all
 40  select 'Bosatlas', date '2006-12-01', 'Netherlands', 1 from dual
 41  /

Tabel is aangemaakt.


The book called “Bosatlas” has a record for every month, but only in the Netherlands. “The Da Vinci Code” has been sold in three countries, but not for every month. The columns book, month and country make up the primary key for this table. To show the months a little better, I change the date format mask like this:

SQL> alter session set nls_date_format = 'fmmonth yyyy'
2 /


The example below shows how all sales of “Bosatlas” after june 2006 are doubled:

SQL> select book
  2       , month
  3       , country
  4       , amount
  5    from sales
  6   model
  7         return updated rows
  8         partition by (country)
  9         dimension by (book,month)
 10         measures (amount)
 11         rules
 12         ( amount['Bosatlas',month > date '2006-06-01'] =
 13             amount['Bosatlas',cv(month)] * 2
 14         )
 15  /

BOOK              MONTH          COUNTRY            AMOUNT
----------------- -------------- -------------- ----------
Bosatlas          juli 2006      Netherlands            24
Bosatlas          augustus 2006  Netherlands            40
Bosatlas          september 2006 Netherlands             8
Bosatlas          oktober 2006   Netherlands            10
Bosatlas          november 2006  Netherlands             2
Bosatlas          december 2006  Netherlands             2

6 rijen zijn geselecteerd.


Measure amount now has two dimensions. So we have to provide a book as well as a month, to refer to a cell in the model. With one rule six cells are being modified, due to using the expression month > date ‘2006-06-01’ on the left side of the rule. At the right side the function cv is used to refer to the corresponding value at the left side, where cv stands for current value. Instead of cv(month), I also could have used cv(); because there is only one multi-cell reference present. Even if more multi-cell references are used, cv() may be used, provided it is clear which dimension is being referred to. If this is not clear, you’ll get an ORA-32611:

FOUT in regel N:
.ORA-32611: onjuist gebruik van operator MODEL CV


When you want to refer to all dimension values, instead of a subset of the values, you use the word ANY:

SQL> select book
  2       , month
  3       , country
  4       , amount
  5    from sales
  6   model
  7         return updated rows
  8         partition by (country)
  9         dimension by (book, month)
 10         measures (amount)
 11         rules
 12         ( amount[any,date '2005-08-01'] = 200
 13         )
 14   order by book, month
 15  /

BOOK              MONTH          COUNTRY            AMOUNT
----------------- -------------- -------------- ----------
Bosatlas          augustus 2005  Netherlands           200
The Da Vinci Code augustus 2005  United Kingdom        200
The Da Vinci Code augustus 2005  France                200

3 rijen zijn geselecteerd.


In this rule the word ANY is used to refer to all books that have sales records for august 2005. Multi-cell references are also possible with a FOR keyword, for example: amount[‘Bosatlas’,for month from date ‘2005-03-01’ to date ‘2005-08-01’ increment 1], and with a BETWEEN, for example: aantal[‘Bosatlas’],month between date ‘2005-03-01’ and date ‘2005-08-01’], and with all other comparison operators.

Reference models

Reference models are sub models within a model. The main model is able to use all values provided in the reference model, but the values from the reference model are not accessible to the outer select list. We are dealing here with auxiliary data that is read only for the main model. To illustrate this I create an extra table storing prices of the two books:

SQL> create table prices
  2  as
  3  select 'Bosatlas' book, 42.95 price from dual union all
  4  select 'The Da Vinci Code', 19.95 from dual
  5  /

Tabel is aangemaakt.


The two rows in the price table are supplied to the main model by means of a reference model:

SQL> select book
  2       , month
  3       , country
  4       , amount
  5       , to_char(turnover,'99G990D00') turnover
  6    from sales
  7   where month between date '2005-07-01' and date '2005-12-31'
  8   model
  9         reference prices on (select book, price from prices)
 10           dimension by (book)
 11           measures (price)
 12         main result
 13           partition by (country)
 14           dimension by (book, month)
 15           measures (0 as turnover, amount)
 16           rules
 17           ( turnover[any,any] = amount[cv(),cv()] * price[cv(book)]
 18           )
 19   order by book
 20       , month
 21  /

BOOK              MONTH          COUNTRY            AMOUNT TURNOVER
----------------- -------------- -------------- ---------- ----------
Bosatlas          juli 2005      Netherlands            31   1.331,45
Bosatlas          augustus 2005  Netherlands            26   1.116,70
Bosatlas          september 2005 Netherlands            22     944,90
Bosatlas          oktober 2005   Netherlands            23     987,85
Bosatlas          november 2005  Netherlands            88   3.779,60
Bosatlas          december 2005  Netherlands           143   6.141,85
The Da Vinci Code juli 2005      Netherlands             2      39,90
The Da Vinci Code augustus 2005  United Kingdom          2      39,90
The Da Vinci Code augustus 2005  France                  3      59,85
The Da Vinci Code oktober 2005   Netherlands             1      19,95

10 rijen zijn geselecteerd.


So the reference model also contains dimensions and measures, but it cannot contain a partition clause nor its own rules. You also may have noticed the fact that both the main model and the reference model are now named: the reference model is called prices, prefixed by the keyword REFERENCE, and the main model is called result, prefixed by the keyword MAIN. The measure price from the reference model is being used inside the rule of the main model by the phrase price[cv(book)]. Also new here is the creation of an extra measure called turnover, initially filled with 0 for all existing cells.

Iteration

Iteration offers the opportunity to repeat the same rule multiple times on the same model. In the example below the number of sales of The Da Vinci Code is taken to the fourth power, without using the POWER function, just to show how iterations work. To do so we have to multiply the number of sales three times with the original number of sales, like this:

SQL> select book
  2       , month
  3       , country
  4       , a1 amount
  5       , a2 amount_to_the_fourth
  6    from sales
  7   where book = 'The Da Vinci Code'
  8     and country = 'Netherlands'
  9   model
 10         partition by (country)
 11         dimension by (book,month)
 12         measures (amount a1, amount a2)
 13         rules iterate (3)
 14         ( a2[any,any] = a2[cv(),cv()] * a1[cv(),cv()]
 15         )
 16   order by month
 17  /

BOOK              MONTH          COUNTRY            AMOUNT AMOUNT_TO_THE_FOURTH
----------------- -------------- -------------- ---------- --------------------
The Da Vinci Code maart 2005     Netherlands             5                  625
The Da Vinci Code april 2005     Netherlands             8                 4096
The Da Vinci Code mei 2005       Netherlands             3                   81
The Da Vinci Code juli 2005      Netherlands             2                   16
The Da Vinci Code oktober 2005   Netherlands             1                    1

5 rijen zijn geselecteerd.


Directly after the RULES keyword, we place the keyword ITERATE, followed by the number of iterations. Measure a1 contains all original and unchanged number of sales. Measure a2 initially contains the numbers 5, 8, 3, 2 and 1; the same as a1. During each of the three iterations these numbers are multiplied with the numbers of measure a1. So after 1 iteration, a2 contains 25, 64, 9, 4 and 1, and after the second iteration 125, 512, 27, 8 and 1. And finally the numbers are as can be seen in the result set above.

Very handy in working with iterations is the keyword UNTIL followed by a boolean expression to be able skip unnecessary iterations. The word ITERATION_NUMBER can be used in the rule which holds the number of iterations. But be warned: it starts with 0. So in the example above, with iterate (3), it increases from 0 until 2.

To show how UNTIL and ITERATION_NUMBER work, I came up with an example where predictions of future sales are generated according to the rule that they will drop with 75% each year. So each year we will only sell 25% of the books as the same month previous year, where any fractions are truncated. The statement will have to generate sales records until there isn’t a single book sold in an entire year. This query will do that for you:

SQL> select book
  2       , country
  3       , to_date(to_char(y) || to_char(m),'yyyymm') month
  4       , amount
  5    from sales
  6   where book = 'Bosatlas'
  7     and extract (year from month) = 2006
  8   model
  9         partition by ( book, country)
 10         dimension by ( extract(year from month) y, extract(month from month) m)
 11         measures (amount, 0 max_monthly_amount)
 12         rules upsert all
 13         iterate (100) until (max_monthly_amount[2007+iteration_number,1] < 4)
 14         ( amount[2007+iteration_number,any]
 15           = trunc(amount[2006+iteration_number,cv()]/4)
 16         , max_monthly_amount[2007+iteration_number,1]
 17           = max(amount)[2007+iteration_number,m between 1 and 12]
 18         )
 19   order by y, m
 20  /


Now this is a lot more complex than what we have seen before. As you can see by the where clause, we’ll start with the 12 rows for the book “Bosatlas” in 2006. The month column is split into two dimensions: the year and the month number. This is done to be able to make 12 rows for an entire year in one shot. To prevent making a year with only 0 sales for all months, I created an auxiliary measure called max_monthly_amount. This measure is populated by an analytic function with the syntax max(amount)[2007+iteration_number, month_number between 1 and 12] and gives the maximum sales number for the current year. This measure is used in the UNTIL clause. The iterations should stop when this highest number is below 4. For if there is a sales amount of 4 or higher, this will lead to a non zero number in the next year. With the current data set, out of the specified 100 iterations only two iterations are executed leading to this result set:

BOOK              COUNTRY        MONTH              AMOUNT
----------------- -------------- -------------- ----------
Bosatlas          Netherlands    januari 2006           31
Bosatlas          Netherlands    februari 2006          18
Bosatlas          Netherlands    maart 2006             15
Bosatlas          Netherlands    april 2006             11
Bosatlas          Netherlands    mei 2006               17
Bosatlas          Netherlands    juni 2006               9
Bosatlas          Netherlands    juli 2006              12
Bosatlas          Netherlands    augustus 2006          20
Bosatlas          Netherlands    september 2006          4
Bosatlas          Netherlands    oktober 2006            5
Bosatlas          Netherlands    november 2006           1
Bosatlas          Netherlands    december 2006           1
Bosatlas          Netherlands    januari 2007            7
Bosatlas          Netherlands    februari 2007           4
Bosatlas          Netherlands    maart 2007              3
Bosatlas          Netherlands    april 2007              2
Bosatlas          Netherlands    mei 2007                4
Bosatlas          Netherlands    juni 2007               2
Bosatlas          Netherlands    juli 2007               3
Bosatlas          Netherlands    augustus 2007           5
Bosatlas          Netherlands    september 2007          1
Bosatlas          Netherlands    oktober 2007            1
Bosatlas          Netherlands    november 2007           0
Bosatlas          Netherlands    december 2007           0
Bosatlas          Netherlands    januari 2008            1
Bosatlas          Netherlands    februari 2008           1
Bosatlas          Netherlands    maart 2008              0
Bosatlas          Netherlands    april 2008              0
Bosatlas          Netherlands    mei 2008                1
Bosatlas          Netherlands    juni 2008               0
Bosatlas          Netherlands    juli 2008               0
Bosatlas          Netherlands    augustus 2008           1
Bosatlas          Netherlands    september 2008          0
Bosatlas          Netherlands    oktober 2008            0
Bosatlas          Netherlands    november 2008           0
Bosatlas          Netherlands    december 2008           0

36 rijen zijn geselecteerd.


In this query the UPSERT ALL mode is used. It is one of the three possible modes: UPDATE, UPSERT and UPSERT ALL, where UPSERT is the default and this mode was used in all previous examples. In UPDATE mode it is not possible to create new rows as can be done with UPSERT. The difference between UPSERT and UPSERT ALL is that the latter creates new cells if the left hand side of the rule contains non-positional references. For example: if the left hand side of the rule contains the ANY keyword, in UPSERT mode it would look at all existing cells and find all matches. Using the UPSERT ALL mode, it would create new cells, one for each match of the non-positional dimension values.

More functions and features are available with the model clause, which really are worth the study, but I will only mention them here briefly. First of all there are the functions PRESENTV and PRESENTNNV, the predicate IS PRESENT and the keywords KEEP NAV and IGNORE NAV. In short you can distinguish between missing cells and existing cells with NULL values with these. Then there is the PREVIOUS function to be able to refer to the previous iteration value inside the UNTIL clause during iterations. And finally, I mention the clauses AUTOMATIC ORDER and SEQUENTIAL ORDER, which gives the possibility to Oracle to decide for itself in which order the rules should be evaluated. All clauses and functions mentioned in this paragraph are quite clearly explained in the documentation.

In the last part I will try to explain what practical use the model clause can have. I will show some solutions for previously unsolvable questions and for common problems that can be solved more efficiently than with previously known techniques. Frequent visitors of the OTN Forums will undoubtedly know which problems I am hinting at here. I will try to get this last part published in a magazine first. If it succeeds, I will point to the online version of course. If it does not succeed, then I will post the third part here. So I’ll have to ask for your patience on this one.

SQL script with all statements used

Wednesday, October 24, 2007

Oracle11g for Windows

A quick intermezzo before publishing part two of the SQL model clause tutorial.

Two and a half months after the Oracle11g Linux download became available, Oracle made the Windows version available two days ago. Last night I downloaded it and this evening I finished the install and created my good old SQL*Plus shortcut. I was happy to see the install did not impose any problems:

SQL*Plus: Release 11.1.0.6.0 - Production on Wo Okt 24 22:25:31 2007

Copyright (c) 1982, 2007, Oracle. All rights reserved.


Verbonden met:
Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

Sessie: 137,77
Eventueel trace-bestand: c:\oracle\diag\rdbms\ora11g\ora11gr1\trace\ora11gr1_ora_2592.trc

rwijk@ORA11G>

Monday, October 1, 2007

SQL Model Clause Tutorial, part one

SQL Model Clause Tutorial, part two
SQL Model Clause Tutorial, part three

If you prefer to read in French, then you can find a translated version of this tutorial here, and if you prefer to read in German, then you can find a translated version here.


With the SQL model clause you build one or more matrixes with a variable number of dimensions. This is called the model. The model uses a subset of the available columns from your FROM clause. It contains at least one dimension, at least one measure and optionally one or more partitions. You can think of a model as a spreadsheet file containing separate worksheets for each calculated value (measures). A worksheet has a X- and an Y-axis (two dimensions) and you can imagine having your worksheets split up in several identical areas, each for a different country or department (partition).

The next figure shows a model of the well known EMP table, where deptno is a partition, empno a dimension and sal and comm are two measures:



Once your model is setup, you define rules that modify your measure values. These rules are the core of the model clause. With few rules you are able to make complex calculations on your data and you are able to create new rows as well. The measure columns are now arrays that are indexed by the dimension columns, where the rules are applied to all partitions of this array. After all rules are applied, the model is converted back to traditional rows.

My experience with the model clause wire diagram in the Oracle documentation is that it is quite complex and it tends to scare off people. That’s not what I’d like to achieve here, so I’ll use a different approach using lots of small examples using the EMP and DEPT table, starting with very simple ones, and gradually expanding on that. At the end of this piece, you’ll find a script you can download and run on your own database.

 SQL> select empno
  2       , ename
  3       , sal
  4    from emp
  5   where deptno = 10
  6  /

     EMPNO ENAME             SAL
---------- ---------- ----------
      7782 CLARK            2450
      7839 KING             5000
      7934 MILLER           1300

3 rijen zijn geselecteerd.


These are the known contents of table EMP for department 10. The equivalent of this SQL statement using a model clause, that does nothing:

SQL> select empno
  2       , ename
  3       , sal
  4    from emp
  5   where deptno = 10
  6   model
  7         dimension by (empno)
  8         measures (ename, sal)
  9         rules
 10         ()
 11  /

     EMPNO ENAME             SAL
---------- ---------- ----------
      7782 CLARK            2450
      7839 KING             5000
      7934 MILLER           1300

3 rijen zijn geselecteerd.


Here we have two measures, ename and sal, and one dimension being empno. The combination of all partitions and dimension columns have to be unique. This is checked at runtime, and if violated, you’ll get an ORA-32638. Because of the two measures, two one dimensional arrays are internally created, which are index by empno. At line 9 you see the keyword “RULES”, but without contents yet. This is the place where the rules will be defined doing the calculations and the creations of new rows. The keyword “RULES” is optional, but for sake of clarity I will always write it down. When done modeling, all partitions, dimensions and measures are converted back to columns in traditional rows, which means that you can only have columns in your select list that appear in your model. If, for instance, I did not include the column ename as a measure, I would get this error message

 SQL> select empno
  2       , ename
  3       , sal
  4    from emp
  5   where deptno = 10
  6   model
  7         dimension by (empno)
  8         measures (sal)
  9         rules
 10         ()
 11  /
     , ename
       *
FOUT in regel 2:
.ORA-32614: ongeldige uitdrukking MODEL SELECT


In the next example I create a new row:

 SQL> select empno
  2       , ename
  3       , sal
  4    from emp
  5   where deptno = 10
  6   model
  7         dimension by (empno)
  8         measures (ename,sal)
  9         rules
 10         ( ename[7777] = 'VAN WIJK'
 11         )
 12  /

     EMPNO ENAME             SAL
---------- ---------- ----------
      7782 CLARK            2450
      7839 KING             5000
      7934 MILLER           1300
      7777 VAN WIJK

4 rijen zijn geselecteerd.


The rule at line number 10 shows how measure ename is expanded with dimension 7777 and gets the value ‘VAN WIJK’. If table EMP would have contained an empno 7777 already, then the existing ename value for empno would have been overwritten by this rule. But 7777 does not exist in EMP, so now a new cell has been created, which shows itself as a new row in the result set. Please note that the row is not being inserted into the table, but only to the result set of the query. By adding a second rule you can fill the sal column as well:

 SQL> select empno
  2       , ename
  3       , sal
  4    from emp
  5   where deptno = 10
  6   model
  7         dimension by (empno)
  8         measures (ename,sal)
  9         rules
 10         ( ename[7777] = 'VAN WIJK'
 11         , sal[7777] = 2500
 12         )
 13  /

     EMPNO ENAME             SAL
---------- ---------- ----------
      7782 CLARK            2450
      7839 KING             5000
      7934 MILLER           1300
      7777 VAN WIJK         2500

4 rijen zijn geselecteerd.


The query returns both the existing rows as the new rows. With the keywords “RETURN UPDATED ROWS” you have the possibility to return only the new and updated ones:

 SQL> select empno
  2       , ename
  3       , sal
  4    from emp
  5   where deptno = 10
  6   model
  7         return updated rows
  8         dimension by (empno)
  9         measures (ename,sal)
 10         rules
 11         ( ename[7777] = 'VAN WIJK'
 12         , sal[7777] = 2500
 13         )
 14  /

     EMPNO ENAME             SAL
---------- ---------- ----------
      7777 VAN WIJK         2500

1 rij is geselecteerd.


All calculations are being performed on each partition. We can see this happening when we drop the filter predicate “deptno = 10” and show the deptno column instead. Now the question becomes whether to include the deptno column as a partition or as a measure. First, let’s explore what happens if we define deptno as a measure:

 SQL> select empno
  2       , ename
  3       , sal
  4       , deptno
  5    from emp
  6   model
  7         return updated rows
  8         dimension by (empno)
  9         measures (ename,sal,deptno)
 10         rules
 11         ( ename[7777] = 'VAN WIJK'
 12         )
 13  /

     EMPNO ENAME             SAL     DEPTNO
---------- ---------- ---------- ----------
      7777 VAN WIJK

1 rij is geselecteerd.


One row is created, as expected, with a NULL deptno value. Now let’s explore what happens if we define deptno as a partition:

 SQL> select empno
  2       , ename
  3       , sal
  4       , deptno
  5    from emp
  6   model
  7         return updated rows
  8         partition by (deptno)
  9         dimension by (empno)
 10         measures (ename,sal)
 11         rules
 12         ( ename[7777] = 'VAN WIJK'
 13         )
 14  /

     EMPNO ENAME             SAL     DEPTNO
---------- ---------- ---------- ----------
      7777 VAN WIJK                      30
      7777 VAN WIJK                      20
      7777 VAN WIJK                      10

3 rijen zijn geselecteerd.


A clear difference. In table EMP only the deptno values 10, 20 and 30 occur and therefore this model effectively has three partitions. The rule is applied to all three partitions, thus resulting in the addition of three new rows.

So far the examples will probably not have persuaded you to use the model clause. A lot of extra lines of SQL code were introduced in order to get something done which is quite simple: with the creative use the UNION ALL set operator and table DUAL we could have achieved the same results. The intention of this first part is only to show the basics. In the next part I will show multi cell references, references models and iterations. And this is where things become more interesting.

SQL script with all statements used