Monday, November 26, 2007

Setting up journaling using flashback data archive

One of 11g new features is Flashback Data Archive, or in marketing terms: Oracle Total Recall. With earlier flashback technologies it was possible to access Oracle's undo information for as long as it was available. With flashback archives you can extend this period for as long as you specify. This makes it possible to implement journaling using this new technique. This blog entry describes how to do this and a number of things you need to be aware of.

Before 11g, journaling was typically implemented using a separate table with the same columns as the base table, plus three additional columns: jn_operation (Insert/Update/Delete), jn_user and jn_datetime. Three after row triggers on the base table made sure every DML was recorded in the journaling table. So let's see how the same can be achieved using flashback archives.

First of all I create a tablespace especially for flashback archives:

rwijk@ORA11G> create tablespace my_tablespace datafile 'extra_file.dat' size 10M
2 /

Tabelruimte is aangemaakt.


In the tablespace I can define several flashback archives each with its own retention period. For example:

rwijk@ORA11G> create flashback archive flashback_archive_10_years
2 tablespace my_tablespace
3 retention 10 year
4 /

Flashbackarchief is aangemaakt.

rwijk@ORA11G> create flashback archive flashback_archive_5_years
2 tablespace my_tablespace
3 retention 5 year
4 /

Flashbackarchief is aangemaakt.


And I create a table my_emp and attach it to the flashback archive with the 10 year retention period:

rwijk@ORA11G> create table my_emp
2 ( empno number(4) not null
3 , ename varchar2(10)
4 , sal number(7,2)
5 , deptno number(2)
6 ) partition by range (deptno)
7 ( partition deptno10 values less than (11)
8 , partition deptno20 values less than (21)
9 , partition deptno30 values less than (31)
10 , partition deptnorest values less than (maxvalue)
11 ) flashback archive flashback_archive_10_years
12 /

Tabel is aangemaakt.


The range partitioning is not required; I only do this to show one point later on in this blog entry.

The information about the flashback archives is available in the views DBA_FLASHBACK_ARCHIVE, DBA_FLASHBACK_ARCHIVE_TS and DBA_FLASHBACK_ARCHIVE_TABLES:

rwijk@ORA11G> select * from dba_flashback_archive
2 /

FLASHBACK_ARCHIVE_NAME FLASHBACK_ARCHIVE# RETENTION_IN_DAYS
-------------------------- ------------------ -----------------
CREATE_TIME LAST_PURGE_TIME STATUS
------------------------------ ------------------------------ -------
FLASHBACK_ARCHIVE_5_YEARS 2 1825
27-11-07 23:53:51,000000000 27-11-07 23:53:51,000000000

FLASHBACK_ARCHIVE_10_YEARS 1 3650
27-11-07 23:53:51,000000000 27-11-07 23:53:51,000000000


2 rijen zijn geselecteerd.

rwijk@ORA11G> select * from dba_flashback_archive_ts
2 /

FLASHBACK_ARCHIVE_NAME FLASHBACK_ARCHIVE# TABLESPACE_NAME
-------------------------- ------------------ ------------------------------
QUOTA_IN_MB
----------------------------------------
FLASHBACK_ARCHIVE_5_YEARS 2 MY_TABLESPACE


FLASHBACK_ARCHIVE_10_YEARS 1 MY_TABLESPACE



2 rijen zijn geselecteerd.

rwijk@ORA11G> select * from dba_flashback_archive_tables
2 /

TABLE_NAME OWNER_NAME
------------------------------ ------------------------------
FLASHBACK_ARCHIVE_NAME ARCHIVE_TABLE_NAME
-------------------------- ------------------------------
MY_EMP RWIJK
FLASHBACK_ARCHIVE_10_YEARS SYS_FBA_HIST_71097


1 rij is geselecteerd.


Now let's do some DML against this new table:

rwijk@ORA11G> insert into my_emp
2 select empno, ename, sal, deptno from emp
3 /

14 rijen zijn aangemaakt.

rwijk@ORA11G> commit
2 /

Commit is voltooid.

rwijk@ORA11G> update my_emp
2 set sal = sal * 1.1
3 where deptno = 20
4 /

5 rijen zijn bijgewerkt.

rwijk@ORA11G> commit
2 /

Commit is voltooid.

rwijk@ORA11G> update my_emp
2 set ename = initcap(ename)
3 where deptno != 30
4 /

8 rijen zijn bijgewerkt.

rwijk@ORA11G> commit
2 /

Commit is voltooid.

rwijk@ORA11G> insert into my_emp (empno,ename,sal,deptno)
2 values (7777,'VAN WIJK', 1600, 40)
3 /

1 rij is aangemaakt.

rwijk@ORA11G> delete my_emp
2 where sal = 3300
3 /

2 rijen zijn verwijderd.

rwijk@ORA11G> commit
2 /

Commit is voltooid.


And use a Flashback Version Query to retrieve all information:

rwijk@ORA11G> select versions_xid
2 , versions_startscn
3 , versions_starttime
4 , versions_endscn
5 , versions_endtime
6 , versions_operation
7 , empno
8 , ename
9 , sal
10 , deptno
11 from my_emp versions between scn minvalue and maxvalue
12 order by empno
13 , versions_startscn
14 /

VERSIONS_XID VERSIONS_STARTSCN VERSIONS_STARTTIME VERSIONS_ENDSCN
---------------- ----------------- -------------------- ---------------
VERSIONS_ENDTIME V EMPNO ENAME SAL DEPTNO
-------------------- - ----- ---------- ----- ------
0A00030079030000 1589358 27-11-07 23:58:39 1589361
27-11-07 23:58:39 I 7369 SMITH 800 20

0400070075030000 1589361 27-11-07 23:58:39 1589365
27-11-07 23:58:39 U 7369 SMITH 880 20

0700110078030000 1589365 27-11-07 23:58:39
U 7369 Smith 880 20

0A00030079030000 1589358 27-11-07 23:58:39
I 7499 ALLEN 1600 30

0A00030079030000 1589358 27-11-07 23:58:39
I 7521 WARD 1250 30

0A00030079030000 1589358 27-11-07 23:58:39 1589361
27-11-07 23:58:39 I 7566 JONES 2975 20

0400070075030000 1589361 27-11-07 23:58:39 1589365
27-11-07 23:58:39 U 7566 JONES 3273 20

0700110078030000 1589365 27-11-07 23:58:39
U 7566 Jones 3273 20

0A00030079030000 1589358 27-11-07 23:58:39
I 7654 MARTIN 1250 30

0A00030079030000 1589358 27-11-07 23:58:39
I 7698 BLAKE 2850 30

09001C00B5040000 1589368 27-11-07 23:58:39
I 7777 VAN WIJK 1600 40

0A00030079030000 1589358 27-11-07 23:58:39 1589365
27-11-07 23:58:39 I 7782 CLARK 2450 10

0700110078030000 1589365 27-11-07 23:58:39
U 7782 Clark 2450 10

0A00030079030000 1589358 27-11-07 23:58:39 1589361
27-11-07 23:58:39 I 7788 SCOTT 3000 20

0400070075030000 1589361 27-11-07 23:58:39 1589365
27-11-07 23:58:39 U 7788 SCOTT 3300 20

0700110078030000 1589365 27-11-07 23:58:39 1589368
27-11-07 23:58:39 U 7788 Scott 3300 20

09001C00B5040000 1589368 27-11-07 23:58:39
D 7788 Scott 3300 20

0A00030079030000 1589358 27-11-07 23:58:39 1589365
27-11-07 23:58:39 I 7839 KING 5000 10

0700110078030000 1589365 27-11-07 23:58:39
U 7839 King 5000 10

0A00030079030000 1589358 27-11-07 23:58:39
I 7844 TURNER 1500 30

0A00030079030000 1589358 27-11-07 23:58:39 1589361
27-11-07 23:58:39 I 7876 ADAMS 1100 20

0400070075030000 1589361 27-11-07 23:58:39 1589365
27-11-07 23:58:39 U 7876 ADAMS 1210 20

0700110078030000 1589365 27-11-07 23:58:39
U 7876 Adams 1210 20

0A00030079030000 1589358 27-11-07 23:58:39
I 7900 JAMES 950 30

0A00030079030000 1589358 27-11-07 23:58:39 1589361
27-11-07 23:58:39 I 7902 FORD 3000 20

0400070075030000 1589361 27-11-07 23:58:39 1589365
27-11-07 23:58:39 U 7902 FORD 3300 20

0700110078030000 1589365 27-11-07 23:58:39 1589368
27-11-07 23:58:39 U 7902 Ford 3300 20

09001C00B5040000 1589368 27-11-07 23:58:39
D 7902 Ford 3300 20

0A00030079030000 1589358 27-11-07 23:58:39 1589365
27-11-07 23:58:39 I 7934 MILLER 1300 10

0700110078030000 1589365 27-11-07 23:58:39
U 7934 Miller 1300 10


30 rijen zijn geselecteerd.


So indeed all journaling information is available here. All, but one: the user who modified the data is not there. So let's audit the table to get this piece of information as well and create a journaling view to join the audit information with the journaling info.

rwijk@ORA11G> audit insert,update,delete on my_emp by access
2 /

Audit is geslaagd.

rwijk@ORA11G> create view my_emp_jn_v
2 as
3 select me.versions_operation jn_operation
4 , ao.username jn_user
5 , me.versions_starttime jn_datetime
6 , me.versions_startscn jn_scn
7 , me.empno
8 , me.ename
9 , me.sal
10 , me.deptno
11 from my_emp versions between scn minvalue and maxvalue me
12 , user_audit_object ao
13 where me.versions_xid = ao.transactionid (+)
14 /

View is aangemaakt.


And do some more transactions, including one transaction in another session by a user called SOMEBODY_ELSE:

rwijk@ORA11G> insert into my_emp (empno,ename,sal,deptno)
2 values (7778, 'ROB', 1700, 40)
3 /

1 rij is aangemaakt.

rwijk@ORA11G> commit
2 /

Commit is voltooid.

rwijk@ORA11G> create user somebody_else identified by somebody_else
2 /

Gebruiker is aangemaakt.

rwijk@ORA11G> grant create session to somebody_else
2 /

Toekennen is geslaagd.

rwijk@ORA11G> grant delete on my_emp to somebody_else
2 /

Toekennen is geslaagd.

rwijk@ORA11G> remark Start another session and connect as somebody_else
rwijk@ORA11G> remark and issue the following commands:
rwijk@ORA11G> remark
rwijk@ORA11G> remark delete rwijk.my_emp where empno = 7777
rwijk@ORA11G> remark /
rwijk@ORA11G> remark commit
rwijk@ORA11G> remark /
rwijk@ORA11G> remark exit
rwijk@ORA11G> pause

rwijk@ORA11G> drop user somebody_else
2 /

Gebruiker is verwijderd.

rwijk@ORA11G> update my_emp
2 set sal = 1800
3 where empno = 7778
4 /

1 rij is bijgewerkt.

rwijk@ORA11G> commit
2 /

Commit is voltooid.


And look, the missing piece of information is also here:

rwijk@ORA11G> select *
2 from my_emp_jn_v
3 order by empno
4 , jn_scn
5 /

J JN_USER JN_DATETIME JN_SCN EMPNO ENAME SAL DEPTNO
- ------------- ------------------- -------- ----- ---------- ----- ------
I 28-11-07 00:05:03 1590771 7369 SMITH 800 20
U 28-11-07 00:05:03 1590774 7369 SMITH 880 20
U 28-11-07 00:05:03 1590778 7369 Smith 880 20
I 28-11-07 00:05:03 1590771 7499 ALLEN 1600 30
I 28-11-07 00:05:03 1590771 7521 WARD 1250 30
I 28-11-07 00:05:03 1590771 7566 JONES 2975 20
U 28-11-07 00:05:03 1590774 7566 JONES 3273 20
U 28-11-07 00:05:03 1590778 7566 Jones 3273 20
I 28-11-07 00:05:03 1590771 7654 MARTIN 1250 30
I 28-11-07 00:05:03 1590771 7698 BLAKE 2850 30
I 28-11-07 00:05:03 1590781 7777 VAN WIJK 1600 40
D SOMEBODY_ELSE 28-11-07 00:05:21 1590838 7777 VAN WIJK 1600 40
I RWIJK 28-11-07 00:05:06 1590799 7778 ROB 1700 40
U RWIJK 28-11-07 00:05:33 1590859 7778 ROB 1800 40
I 28-11-07 00:05:03 1590771 7782 CLARK 2450 10
U 28-11-07 00:05:03 1590778 7782 Clark 2450 10
I 28-11-07 00:05:03 1590771 7788 SCOTT 3000 20
U 28-11-07 00:05:03 1590774 7788 SCOTT 3300 20
U 28-11-07 00:05:03 1590778 7788 Scott 3300 20
D 28-11-07 00:05:03 1590781 7788 Scott 3300 20
I 28-11-07 00:05:03 1590771 7839 KING 5000 10
U 28-11-07 00:05:03 1590778 7839 King 5000 10
I 28-11-07 00:05:03 1590771 7844 TURNER 1500 30
I 28-11-07 00:05:03 1590771 7876 ADAMS 1100 20
U 28-11-07 00:05:03 1590774 7876 ADAMS 1210 20
U 28-11-07 00:05:03 1590778 7876 Adams 1210 20
I 28-11-07 00:05:03 1590771 7900 JAMES 950 30
I 28-11-07 00:05:03 1590771 7902 FORD 3000 20
U 28-11-07 00:05:03 1590774 7902 FORD 3300 20
U 28-11-07 00:05:03 1590778 7902 Ford 3300 20
D 28-11-07 00:05:03 1590781 7902 Ford 3300 20
I 28-11-07 00:05:03 1590771 7934 MILLER 1300 10
U 28-11-07 00:05:03 1590778 7934 Miller 1300 10

33 rijen zijn geselecteerd.


Note that I also used the SCN in the view, to be able to sort even when the transactions occur within one second.

So it looks very promising... well, until you read this warning in the documentation

Remember that DDLs that alter the structure of a table (such as drop/modify column, move table, drop partition, truncate table/partition, and add constraint) invalidate any existing undo data for the table. If you try to retrieve data from a time before such a DDL executed, you will get error ORA-1466.


Now this is a serious showstopper: in my experience the tables that need journaling are also the tables that require regular maintenance. And if that maintenance would invalidate existing undo ...

Let's test it (and here is why I range partitioned the table):

rwijk@ORA11G> alter table my_emp drop column comm
2 /
alter table my_emp drop column comm
*
FOUT in regel 1:
.ORA-55610: Ongeldig DDL-statement op tabel met historietracking


rwijk@ORA11G> alter table my_emp modify (ename varchar2(50))
2 /
alter table my_emp modify (ename varchar2(50))
*
FOUT in regel 1:
.ORA-55610: Ongeldig DDL-statement op tabel met historietracking


rwijk@ORA11G> alter table my_emp drop partition deptno10
2 /
alter table my_emp drop partition deptno10
*
FOUT in regel 1:
.ORA-55610: Ongeldig DDL-statement op tabel met historietracking


rwijk@ORA11G> truncate table my_emp
2 /
truncate table my_emp
*
FOUT in regel 1:
.ORA-55610: Ongeldig DDL-statement op tabel met historietracking


So Oracle does not even allow us to do this type of actions, and the existing undo is not harmed:

rwijk@ORA11G> select *
2 from my_emp_jn_v
3 order by empno
4 , jn_scn
5 /

J JN_USER JN_DATETIME JN_SCN EMPNO ENAME SAL DEPTNO
- ------------- ------------------- -------- ----- ---------- ----- ------
I 28-11-07 00:05:03 1590771 7369 SMITH 800 20
U 28-11-07 00:05:03 1590774 7369 SMITH 880 20
U 28-11-07 00:05:03 1590778 7369 Smith 880 20
I 28-11-07 00:05:03 1590771 7499 ALLEN 1600 30
I 28-11-07 00:05:03 1590771 7521 WARD 1250 30
I 28-11-07 00:05:03 1590771 7566 JONES 2975 20
U 28-11-07 00:05:03 1590774 7566 JONES 3273 20
U 28-11-07 00:05:03 1590778 7566 Jones 3273 20
I 28-11-07 00:05:03 1590771 7654 MARTIN 1250 30
I 28-11-07 00:05:03 1590771 7698 BLAKE 2850 30
I 28-11-07 00:05:03 1590781 7777 VAN WIJK 1600 40
D SOMEBODY_ELSE 28-11-07 00:05:21 1590838 7777 VAN WIJK 1600 40
I RWIJK 28-11-07 00:05:06 1590799 7778 ROB 1700 40
U RWIJK 28-11-07 00:05:33 1590859 7778 ROB 1800 40
I 28-11-07 00:05:03 1590771 7782 CLARK 2450 10
U 28-11-07 00:05:03 1590778 7782 Clark 2450 10
I 28-11-07 00:05:03 1590771 7788 SCOTT 3000 20
U 28-11-07 00:05:03 1590774 7788 SCOTT 3300 20
U 28-11-07 00:05:03 1590778 7788 Scott 3300 20
D 28-11-07 00:05:03 1590781 7788 Scott 3300 20
I 28-11-07 00:05:03 1590771 7839 KING 5000 10
U 28-11-07 00:05:03 1590778 7839 King 5000 10
I 28-11-07 00:05:03 1590771 7844 TURNER 1500 30
I 28-11-07 00:05:03 1590771 7876 ADAMS 1100 20
U 28-11-07 00:05:03 1590774 7876 ADAMS 1210 20
U 28-11-07 00:05:03 1590778 7876 Adams 1210 20
I 28-11-07 00:05:03 1590771 7900 JAMES 950 30
I 28-11-07 00:05:03 1590771 7902 FORD 3000 20
U 28-11-07 00:05:03 1590774 7902 FORD 3300 20
U 28-11-07 00:05:03 1590778 7902 Ford 3300 20
D 28-11-07 00:05:03 1590781 7902 Ford 3300 20
I 28-11-07 00:05:03 1590771 7934 MILLER 1300 10
U 28-11-07 00:05:03 1590778 7934 Miller 1300 10

33 rijen zijn geselecteerd.


Most of the time, maintenance is done by adding one or two columns, with some additional constraints. The adding of a column is not a problem for flashback archives, but adding constraints was also listed as being impossible. Let's verify that as well, by adding a primary key, unique key, foreign key and a check constraint:

rwijk@ORA11G> alter table my_emp add constraint my_pk primary key (empno)
2 /

Tabel is gewijzigd.

rwijk@ORA11G> alter table my_emp add constraint my_uk1 unique (ename)
2 /

Tabel is gewijzigd.

rwijk@ORA11G> alter table my_emp add constraint my_fk1 foreign key (deptno) references dept(deptno)
2 /

Tabel is gewijzigd.

rwijk@ORA11G> alter table my_emp add constraint ck1 check (sal>500)
2 /

Tabel is gewijzigd.


So, the documentation was wrong? Is it possible after all?

No.

Because the undo has been sneakily invalidated:

rwijk@ORA11G> select *
2 from my_emp_jn_v
3 order by empno
4 , jn_scn
5 /

J JN_USER JN_DATETIME JN_SCN EMPNO ENAME SAL DEPTNO
- ------------- ------------------- -------- ----- ---------- ----- ------
7369 Smith 880 20
7499 ALLEN 1600 30
7521 WARD 1250 30
7566 Jones 3273 20
7654 MARTIN 1250 30
7698 BLAKE 2850 30
7778 ROB 1800 40
7782 Clark 2450 10
7839 King 5000 10
7844 TURNER 1500 30
7876 Adams 1210 20
7900 JAMES 950 30
7934 Miller 1300 10

13 rijen zijn geselecteerd.


So if you make the mistake of adding a constraint, you lose all journaling information... This sure is something to be aware of. I think the adding of constraints to a table with a flashback archive should also raise an ORA-55610. Hopefully Oracle will fix this in a future release or patch set.

The only way I can think of to keep journaling information while still being able to do the necessary maintenance, is to create a journaling table in which you dump the information from the journaling view before invalidating the existing undo. Far from ideal, but doable as a workaround.

But what do we gain by implementing journaling this way?

The answer is simple: performance. No longer is it necessary to have database triggers. And because database triggers are evil, you win a lot, as can be seen in a little test in which I create 3 tables, one regular table, one table using a flashback archive and auditing, and one regular table with an after row trigger to populate a second journaling table:

rwijk@ORA11G> create flashback archive flashback_archive_10_years
2 tablespace my_tablespace
3 retention 10 year
4 /

Flashbackarchief is aangemaakt.

rwijk@ORA11G> create table my_emp_regular
2 ( empno number(6) not null
3 , ename varchar2(10)
4 , sal number(8,2)
5 , deptno number(2)
6 )
7 /

Tabel is aangemaakt.

rwijk@ORA11G> create table my_emp_flashback_archive
2 ( empno number(6) not null
3 , ename varchar2(10)
4 , sal number(8,2)
5 , deptno number(2)
6 ) flashback archive flashback_archive_10_years
7 /

Tabel is aangemaakt.

rwijk@ORA11G> audit insert,update,delete on my_emp_flashback_archive by access
2 /

Audit is geslaagd.

rwijk@ORA11G> create table my_emp_bad_old_journaling
2 ( empno number(6) not null
3 , ename varchar2(10)
4 , sal number(8,2)
5 , deptno number(2)
6 )
7 /

Tabel is aangemaakt.

rwijk@ORA11G> create table my_emp_bad_old_journaling_jn
2 ( jn_operation varchar2(1)
3 , jn_user varchar2(30)
4 , jn_datetime date
5 , jn_scn number
6 , empno number(6) not null
7 , ename varchar2(10)
8 , sal number(8,2)
9 , deptno number(2)
10 )
11 /

Tabel is aangemaakt.

rwijk@ORA11G> create trigger jn_ari
2 after insert on my_emp_bad_old_journaling
3 for each row
4 begin
5 insert into my_emp_bad_old_journaling_jn
6 ( jn_operation
7 , jn_user
8 , jn_datetime
9 , jn_scn
10 , empno
11 , ename
12 , sal
13 , deptno
14 )
15 values
16 ( 'I'
17 , user
18 , sysdate
19 , to_number(dbms_flashback.get_system_change_number)
20 , :new.empno
21 , :new.ename
22 , :new.sal
23 , :new.deptno
24 );
25 end;
26 /

Trigger is aangemaakt.

rwijk@ORA11G> set timing on
rwijk@ORA11G> insert into my_emp_regular
2 select level
3 , 'EMP' || to_char(level)
4 , 1000 + level
5 , 10 * (1 + mod(level,4))
6 from dual
7 connect by level <= 100000
8 /

100000 rijen zijn aangemaakt.

Verstreken: 00:00:00.48
rwijk@ORA11G> insert into my_emp_flashback_archive
2 select level
3 , 'EMP' || to_char(level)
4 , 1000 + level
5 , 10 * (1 + mod(level,4))
6 from dual
7 connect by level <= 100000
8 /

100000 rijen zijn aangemaakt.

Verstreken: 00:00:00.50
rwijk@ORA11G> insert into my_emp_bad_old_journaling
2 select level
3 , 'EMP' || to_char(level)
4 , 1000 + level
5 , 10 * (1 + mod(level,4))
6 from dual
7 connect by level <= 100000
8 /

100000 rijen zijn aangemaakt.

Verstreken: 00:00:13.39


As expected, almost no overhead using flashback data archives and therefore a lot faster than using the old fashioned way of journaling. But the maintenance thing described earlier is something to worry about in advance.


Further reading:

Yuri van Buren about Flashback Data Archive

Oracle 11g Total Recall - Flashback in the hands of Database Designers and Application Developers, at last. And: the end of Journalling Tables by Lucas Jellema

Friday, November 16, 2007

SQL Model Unconference

In an "Unconference" session at Oracle Open World, Laurent Schneider did a session about the SQL Model Clause. He posted his demonstration script on his site here.

If you have been reading my SQL Model Clause Tutorial, part one and SQL Model Clause Tutorial, part two, then I can recommend you to run this script as well and see some interesting queries. You only need the EMP and DEPT tables. A nice catch for me was to have a constant in the PARTITION BY clause.

Thursday, November 15, 2007

SAVE EXCEPTIONS

Since version 9 it is possible to do bulk DML using FORALL and use the SAVE EXCEPTIONS clause. It makes sure that all invalid rows - the exceptions - are saved into the sql%bulk_exceptions array, while all valid rows are still processed. This array stores a record for each invalid row, containing an ERROR_INDEX which is the iteration number during the FORALL statement and an ERROR_CODE which corresponds with the Oracle error code.

The error text is not stored. The documentation says:

The individual error messages, or any substitution arguments, are not saved, but the error message text can looked up using ERROR_CODE with SQLERRM ...


Looks reasonable, but in our shop we validate lots of business rules with triggers. When a business rule is violated we do a RAISE_APPLICATION_ERROR(-20000,'APP-12345');

At client side (Webforms) the error message is looked up in the messages table and a friendly message is given. When coding a FORALL with SAVE EXCEPTIONS in such an environment, the error messages become useless as can be seen in the next example:

rwijk@ORA11G> create table mytable
2 ( id number(4)
3 , name varchar2(30)
4 )
5 /

Tabel is aangemaakt.

rwijk@ORA11G> create trigger mytable_bri
2 before insert on mytable
3 for each row
4 begin
5 if :new.id = 2
6 then
7 raise_application_error(-20000,'APP-12345');
8 elsif :new.id = 9
9 then
10 raise_application_error(-20000,'APP-98765');
11 end if;
12 end;
13 /

Trigger is aangemaakt.

rwijk@ORA11G> alter table mytable add constraint mytable_ck1 check (id <> 6)
2 /

Tabel is gewijzigd.

rwijk@ORA11G> declare
2 e_forall_error exception;
3 pragma exception_init(e_forall_error,-24381)
4 ;
5 type t_numbers is table of mytable.id%type;
6 l_numbers t_numbers := t_numbers(1,2,3,4,5,6,7,8,9,10)
7 ;
8 begin
9 forall i in 1..l_numbers.count save exceptions
10 insert into mytable
11 ( id
12 , name
13 )
14 values
15 ( l_numbers(i)
16 , 'Name' || to_char(l_numbers(i))
17 )
18 ;
19 exception
20 when e_forall_error then
21 for i in 1..sql%bulk_exceptions.count
22 loop
23 dbms_output.put_line('SQLCODE: ' || sql%bulk_exceptions(i).error_code);
24 dbms_output.put_line('SQLERRM: ' || sqlerrm(-sql%bulk_exceptions(i).error_code));
25 dbms_output.new_line;
26 end loop;
27 end;
28 /
SQLCODE: 20000
SQLERRM: ORA-20000:

SQLCODE: 2290
SQLERRM: ORA-02290: CHECK-beperking (.) is geschonden.

SQLCODE: 20000
SQLERRM: ORA-20000:


PL/SQL-procedure is geslaagd.

rwijk@ORA11G> select id, name from mytable
2 /

ID NAME
---------- ------------------------------
1 Name1
3 Name3
4 Name4
5 Name5
7 Name7
8 Name8
10 Name10

7 rijen zijn geselecteerd.


Note how the SQLERRM message doesn't return anything useful and that the name of the check constraint has disappeared. This is really annoying and can't be circumvented easily in 9i. For better error messages we would have to go back to row by row processing. And that means: very slow.

However, version 10gR2 introduced a feature called DML error logging. I remember reading about it more than two years ago here on Tom Kyte's blog. In this entry and in the documentation you only see examples using SQL, not PL/SQL examples using FORALL. But luckily this works as well:

rwijk@ORA11G> rollback
2 /

Rollback is voltooid.

rwijk@ORA11G> exec dbms_errlog.create_error_log('mytable')

PL/SQL-procedure is geslaagd.

rwijk@ORA11G> declare
2 type t_numbers is table of mytable.id%type;
3 l_numbers t_numbers := t_numbers(1,2,3,4,5,6,7,8,9,10)
4 ;
5 begin
6 forall i in 1..l_numbers.count
7 insert into mytable
8 ( id
9 , name
10 )
11 values
12 ( l_numbers(i)
13 , 'Name' || to_char(l_numbers(i))
14 )
15 log errors reject limit unlimited
16 ;
17 for r in
18 ( select ora_err_number$
19 , ora_err_mesg$
20 from err$_mytable
21 )
22 loop
23 dbms_output.put_line('SQLCODE: ' || to_char(r.ora_err_number$));
24 dbms_output.put_line('SQLERRM: ' || r.ora_err_mesg$);
25 dbms_output.new_line;
26 end loop
27 ;
28 end;
29 /
SQLCODE: 20000
SQLERRM: ORA-20000: APP-12345
ORA-06512: in "RWIJK.MYTABLE_BRI", regel 4
ORA-04088: Fout bij uitvoering van trigger 'RWIJK.MYTABLE_BRI'
.


SQLCODE: 2290
SQLERRM: ORA-02290: CHECK-beperking (RWIJK.MYTABLE_CK1) is geschonden.


SQLCODE: 20000
SQLERRM: ORA-20000: APP-98765
ORA-06512: in "RWIJK.MYTABLE_BRI", regel 7
ORA-04088: Fout bij uitvoering van trigger 'RWIJK.MYTABLE_BRI'
.



PL/SQL-procedure is geslaagd.

rwijk@ORA11G> select id, name from mytable
2 /

ID NAME
---------- ------------------------------
1 Name1
3 Name3
4 Name4
5 Name5
7 Name7
8 Name8
10 Name10

7 rijen zijn geselecteerd.


And you do get to see the error messages (APP-12345 and APP-98765) and the name of the check constraint. Unfortunately, our shop still uses 9.2.0.7...



UPDATE



It is worth knowing that there are some restrictions when working with the DML error logging clause. From the 11g documentation (same as in 10gR2 documentation):

Restrictions on DML Error Logging

* The following conditions cause the statement to fail and roll back without invoking the error logging capability:

o Violated deferred constraints.

o Any direct-path INSERT or MERGE operation that raises a unique constraint or index violation.

o Any update operation UPDATE or MERGE that raises a unique constraint or index violation.

* You cannot track errors in the error logging table for LONG, LOB, or object type columns. However, the table that is the target of the DML operation can contain these types of columns.

o If you create or modify the corresponding error logging table so that it contains a column of an unsupported type, and if the name of that column corresponds to an unsupported column in the target DML table, then the DML statement fails at parse time.

o If the error logging table does not contain any unsupported column types, then all DML errors are logged until the reject limit of errors is reached. For rows on which errors occur, column values with corresponding columns in the error logging table are logged along with the control information.

Wednesday, November 14, 2007

Interval based row generation

Well, I didn't have a good name for this common SQL problem, so I decided to call it "interval based row generation". If someone out there knows a better or more common name for the problem, then please say so by leaving a comment.

An example says more than a thousand words, so here is an example of what I mean by the term:

Suppose I have this table:

rwijk@ORA10GR2> create table mytable (code,start_date,end_date,value)
2 as
3 select 'group1', date '2007-01-01', date '2007-01-10', 11 from dual union all
4 select 'group2', date '2007-01-01', date '2007-01-01', 22 from dual union all
5 select 'group3', date '2007-01-01', date '2007-01-03', 33 from dual
6 /

Tabel is aangemaakt.


And I want to generate rows for all dates between and including the start_date and the end_date. The query would have to generate 10 rows for group1, 1 row for group2 and 3 rows for group3. I know of six ways to do this type of query. There may be more of course. Here are the six alternatives:

rwijk@ORA10GR2> select /* alternative 1: cartesian product with hard coded upper bound */
2 m.code
3 , m.start_date + l as day
4 , m.value
5 from mytable m
6 , (select level-1 l from dual connect by level <= 1000)
7 where l <= m.end_date - m.start_date
8 order by m.code
9 , day
10 /

CODE DAY VALUE
---------- ------------------- ----------
group1 01-01-2007 00:00:00 11
group1 02-01-2007 00:00:00 11
group1 03-01-2007 00:00:00 11
group1 04-01-2007 00:00:00 11
group1 05-01-2007 00:00:00 11
group1 06-01-2007 00:00:00 11
group1 07-01-2007 00:00:00 11
group1 08-01-2007 00:00:00 11
group1 09-01-2007 00:00:00 11
group1 10-01-2007 00:00:00 11
group2 01-01-2007 00:00:00 22
group3 01-01-2007 00:00:00 33
group3 02-01-2007 00:00:00 33
group3 03-01-2007 00:00:00 33

14 rijen zijn geselecteerd.

rwijk@ORA10GR2> select /* alternative 2: cartesian product with calculated upper bound */
2 m.code
3 , m.start_date + l as day
4 , m.value
5 from mytable m
6 , ( select level-1 l
7 from (select max(end_date-start_date+1) maxinterval from mytable)
8 connect by level <= maxinterval
9 )
10 where l <= m.end_date - m.start_date
11 order by m.code
12 , day
13 /

CODE DAY VALUE
---------- ------------------- ----------
group1 01-01-2007 00:00:00 11
group1 02-01-2007 00:00:00 11
group1 03-01-2007 00:00:00 11
group1 04-01-2007 00:00:00 11
group1 05-01-2007 00:00:00 11
group1 06-01-2007 00:00:00 11
group1 07-01-2007 00:00:00 11
group1 08-01-2007 00:00:00 11
group1 09-01-2007 00:00:00 11
group1 10-01-2007 00:00:00 11
group2 01-01-2007 00:00:00 22
group3 01-01-2007 00:00:00 33
group3 02-01-2007 00:00:00 33
group3 03-01-2007 00:00:00 33

14 rijen zijn geselecteerd.

rwijk@ORA10GR2> select /* alternative 3: hierarchical query with prior dbms_random.value */
2 m.code
3 , m.start_date + level - 1 as day
4 , m.value
5 from mytable m
6 connect by level <= m.end_date - m.start_date + 1
7 and prior m.code = m.code
8 and prior dbms_random.value is not null
9 order by m.code
10 , day
11 /

CODE DAY VALUE
---------- ------------------- ----------
group1 01-01-2007 00:00:00 11
group1 02-01-2007 00:00:00 11
group1 03-01-2007 00:00:00 11
group1 04-01-2007 00:00:00 11
group1 05-01-2007 00:00:00 11
group1 06-01-2007 00:00:00 11
group1 07-01-2007 00:00:00 11
group1 08-01-2007 00:00:00 11
group1 09-01-2007 00:00:00 11
group1 10-01-2007 00:00:00 11
group2 01-01-2007 00:00:00 22
group3 01-01-2007 00:00:00 33
group3 02-01-2007 00:00:00 33
group3 03-01-2007 00:00:00 33

14 rijen zijn geselecteerd.

rwijk@ORA10GR2> select /* alternative 4: hierarchical query with connect_by_root */
2 m.code
3 , m.start_date + level - 1 as day
4 , m.value
5 from mytable m
6 connect by connect_by_root m.code = m.code
7 and level <= m.end_date - m.start_date + 1
8 order by m.code
9 , day
10 /

CODE DAY VALUE
---------- ------------------- ----------
group1 01-01-2007 00:00:00 11
group1 02-01-2007 00:00:00 11
group1 03-01-2007 00:00:00 11
group1 04-01-2007 00:00:00 11
group1 05-01-2007 00:00:00 11
group1 06-01-2007 00:00:00 11
group1 07-01-2007 00:00:00 11
group1 08-01-2007 00:00:00 11
group1 09-01-2007 00:00:00 11
group1 10-01-2007 00:00:00 11
group2 01-01-2007 00:00:00 22
group3 01-01-2007 00:00:00 33
group3 02-01-2007 00:00:00 33
group3 03-01-2007 00:00:00 33

14 rijen zijn geselecteerd.

rwijk@ORA10GR2> select /* alternative 5: Adrian Billington's multiset */
2 m.code
3 , m.start_date + (row_number() over (partition by m.code order by null)-1) as day
4 , m.value
5 from mytable m
6 , table
7 ( cast
8 ( multiset(select null from dual connect by rownum <= (m.end_date-m.start_date)+1)
9 as sys.dbms_debug_vc2coll
10 )
11 ) t
12 order by m.code
13 , day
14 /

CODE DAY VALUE
---------- ------------------- ----------
group1 01-01-2007 00:00:00 11
group1 02-01-2007 00:00:00 11
group1 03-01-2007 00:00:00 11
group1 04-01-2007 00:00:00 11
group1 05-01-2007 00:00:00 11
group1 06-01-2007 00:00:00 11
group1 07-01-2007 00:00:00 11
group1 08-01-2007 00:00:00 11
group1 09-01-2007 00:00:00 11
group1 10-01-2007 00:00:00 11
group2 01-01-2007 00:00:00 22
group3 01-01-2007 00:00:00 33
group3 02-01-2007 00:00:00 33
group3 03-01-2007 00:00:00 33

14 rijen zijn geselecteerd.

rwijk@ORA10GR2> select /* alternative 6: model clause */
2 code
3 , day
4 , value
5 from mytable
6 model
7 partition by (code,value)
8 dimension by (0 i)
9 measures (start_date day, end_date)
10 rules
11 ( day[for i from 1 to end_date[0] - day[0] increment 1] = day[0] + cv(i)
12 )
13 order by code
14 , day
15 /

CODE DAY VALUE
---------- ------------------- ----------
group1 01-01-2007 00:00:00 11
group1 02-01-2007 00:00:00 11
group1 03-01-2007 00:00:00 11
group1 04-01-2007 00:00:00 11
group1 05-01-2007 00:00:00 11
group1 06-01-2007 00:00:00 11
group1 07-01-2007 00:00:00 11
group1 08-01-2007 00:00:00 11
group1 09-01-2007 00:00:00 11
group1 10-01-2007 00:00:00 11
group2 01-01-2007 00:00:00 22
group3 01-01-2007 00:00:00 33
group3 02-01-2007 00:00:00 33
group3 03-01-2007 00:00:00 33

14 rijen zijn geselecteerd.


I noticed something remarkable when running the same queries on my Oracle11g database. Alternatives 1, 2, 5 and 6 ran as expected, but 3 & 4 did not work anymore:

rwijk@ORA11G>  select /* alternative 3: hierarchical query with prior dbms_random.value */
2 m.code
3 , m.start_date + level - 1 as day
4 , m.value
5 from mytable m
6 connect by level <= m.end_date - m.start_date + 1
7 and prior m.code = m.code
8 and prior dbms_random.value is not null
9 order by m.code
10 , day
11 /
from mytable m
*
FOUT in regel 5:
.ORA-01436: CONNECT BY-lus in gebruikersgegevens.


rwijk@ORA11G> select /* alternative 4: hierarchical query with connect_by_root */
2 m.code
3 , m.start_date + level - 1 as day
4 , m.value
5 from mytable m
6 connect by connect_by_root m.code = m.code
7 and level <= m.end_date - m.start_date + 1
8 order by m.code
9 , day
10 /
connect by connect_by_root m.code = m.code
*
FOUT in regel 6:
.ORA-30007: Operator CONNECT BY ROOT wordt niet ondersteund in de voorwaarde START WITH of CONNECT BY.


Alternative 3 uses a trick: line 8 prevented the ORA-01436 connect by loop in user data error in versions 9 and 10, but not anymore in 11.

Alternative 4 shows a new error message introduced in Oracle11g: ORA-30007. Apparently the use of the connect_by_root function has been restricted more. I don't know the reason why, although Oracle probably will have a good reason for the restriction.

In this recent OTN-thread Adrian Billington (of the excellent oracle-developer.net site) showed me alternative 5, which I had not seen before. He also did a small performance test showing that alternative 5 is slightly better than alternative 6, the model clause solution. I decided to investigate a little more by tracing/tkprof and by comparing latches with Tom Kyte's runstats_pkg.

Here are the results of a test with this table:

create table mytable (code,start_date,end_date,value)
as
select 'group' || to_char(level)
, trunc(sysdate)
, trunc(sysdate) + power(ceil(dbms_random.value(0,30)),2)
, 'value' || to_char(level)
from dual
connect by level <= 100
/


Tkprof on Oracle11g shows:

********************************************************************************

select /* alternative 1: cartesian product with hard coded upper bound */
m.code
, m.start_date + l as day
, m.value
from mytable m
, (select level-1 l from dual connect by level <= 1000)
where l <= m.end_date - m.start_date
order by m.code
, day

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 1 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2415 0.31 0.29 0 3000 0 36197
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2417 0.31 0.29 0 3001 0 36197

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 81

Rows Row Source Operation
------- ---------------------------------------------------
36197 SORT ORDER BY (cr=3000 pr=0 pw=0 time=806 us cost=6 size=395 card=5)
36197 NESTED LOOPS (cr=3000 pr=0 pw=0 time=2220 us cost=5 size=395 card=5)
1000 VIEW (cr=0 pr=0 pw=0 time=67 us cost=2 size=13 card=1)
1000 CONNECT BY WITHOUT FILTERING (cr=0 pr=0 pw=0 time=27 us)
1 FAST DUAL (cr=0 pr=0 pw=0 time=0 us cost=2 size=0 card=1)
36197 TABLE ACCESS FULL MYTABLE (cr=3000 pr=0 pw=0 time=4259 us cost=3 size=330 card=5)


Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 2415 0.00 0.00
SQL*Net message from client 2415 0.00 0.52
********************************************************************************

select /* alternative 2: cartesian product with calculated upper bound */
m.code
, m.start_date + l as day
, m.value
from mytable m
, ( select level-1 l
from (select max(end_date-start_date+1) maxinterval from mytable)
connect by level <= maxinterval
)
where l <= m.end_date - m.start_date
order by m.code
, day

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.01 0.00 0 3 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2415 0.26 0.28 0 2706 0 36197
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2417 0.28 0.29 0 2709 0 36197

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 81

Rows Row Source Operation
------- ---------------------------------------------------
36197 SORT ORDER BY (cr=2706 pr=0 pw=0 time=797 us cost=7 size=395 card=5)
36197 NESTED LOOPS (cr=2706 pr=0 pw=0 time=2251 us cost=6 size=395 card=5)
901 VIEW (cr=3 pr=0 pw=0 time=66 us cost=3 size=13 card=1)
901 CONNECT BY WITHOUT FILTERING (cr=3 pr=0 pw=0 time=28 us)
1 VIEW (cr=3 pr=0 pw=0 time=0 us cost=3 size=13 card=1)
1 SORT AGGREGATE (cr=3 pr=0 pw=0 time=0 us)
100 TABLE ACCESS FULL MYTABLE (cr=3 pr=0 pw=0 time=2 us cost=3 size=1800 card=100)
36197 TABLE ACCESS FULL MYTABLE (cr=2703 pr=0 pw=0 time=4266 us cost=3 size=330 card=5)


Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 2415 0.00 0.00
SQL*Net message from client 2415 0.00 0.52
********************************************************************************

select /* alternative 5: Adrian Billington's multiset */
m.code
, m.start_date + (row_number() over (partition by m.code order by null)-1) as day
, m.value
from mytable m
, table
( cast
( multiset(select null from dual connect by rownum <= (m.end_date-m.start_date)+1)
as sys.dbms_debug_vc2coll
)
) t
order by m.code
, day

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 1 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2415 0.35 0.36 0 3 0 36197
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2417 0.35 0.36 0 4 0 36197

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 81

Rows Row Source Operation
------- ---------------------------------------------------
36197 SORT ORDER BY (cr=3 pr=0 pw=0 time=749 us cost=28444 size=53908800 card=816800)
36197 WINDOW SORT (cr=3 pr=0 pw=0 time=902 us cost=28444 size=53908800 card=816800)
36197 NESTED LOOPS (cr=3 pr=0 pw=0 time=4381 us cost=2726 size=53908800 card=816800)
100 TABLE ACCESS FULL MYTABLE (cr=3 pr=0 pw=0 time=4 us cost=3 size=6600 card=100)
36197 COLLECTION ITERATOR SUBQUERY FETCH (cr=0 pr=0 pw=0 time=4111 us)
36197 COUNT (cr=0 pr=0 pw=0 time=2600 us)
36197 CONNECT BY WITHOUT FILTERING (cr=0 pr=0 pw=0 time=962 us)
100 FAST DUAL (cr=0 pr=0 pw=0 time=0 us cost=2 size=0 card=1)


Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 2415 0.00 0.00
SQL*Net message from client 2415 0.00 0.53
********************************************************************************

select /* alternative 6: model clause */
code
, day
, value
from mytable
model
partition by (code,value)
dimension by (0 i)
measures (start_date day, end_date)
rules
( day[for i from 1 to end_date[0] - day[0] increment 1] = day[0] + cv(i)
)
order by code
, day

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 1 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2415 0.46 0.51 0 3 0 36197
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2417 0.46 0.51 0 4 0 36197

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 81

Rows Row Source Operation
------- ---------------------------------------------------
36197 SORT ORDER BY (cr=3 pr=0 pw=0 time=771 us cost=4 size=6600 card=100)
36197 SQL MODEL ORDERED (cr=3 pr=0 pw=0 time=697 us cost=4 size=6600 card=100)
100 TABLE ACCESS FULL MYTABLE (cr=3 pr=0 pw=0 time=3 us cost=3 size=6600 card=100)


Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 2415 0.00 0.00
SQL*Net message from client 2415 0.00 0.53

********************************************************************************


So, the clunky cartesian joins - alternatives 1 and 2 - seem to outperform the multiset and model clause solutions. These alternatives just generate all rows possibly needed and filter out the unneeded ones.

Next I compared alternative 2 with 5 using runstats-pkg:

rwijk@ORA11G> exec runstats_pkg.rs_stop(50)
Run1 draaide in 78 hsecs
Run2 draaide in 86 hsecs
Run1 draaide in 90,7% van de tijd

Naam Run1 Run2 Verschil
LATCH.shared pool 156 222 66
LATCH.enqueues 80 14 -66
LATCH.enqueue hash chains 82 15 -67
STAT.undo change vector size 2,732 2,804 72
STAT.redo size 3,656 3,740 84
STAT.sorts (memory) 2 102 100
STAT.workarea executions - optimal 6 107 101
STAT.no work - consistent read gets 910 9 -901
STAT.table scan blocks gotten 902 1 -901
STAT.table scans (short tables) 902 1 -901
STAT.calls to get snapshot scn: kcmgss 1,811 9 -1,802
LATCH.cache buffers chains 2,856 163 -2,693
STAT.session logical reads 2,753 53 -2,700
STAT.consistent gets 2,727 26 -2,701
STAT.consistent gets from cache 2,727 26 -2,701
STAT.consistent gets from cache (fastpath) 2,720 18 -2,702
STAT.sorts (rows) 36,207 72,512 36,305
STAT.table scan rows gotten 90,200 100 -90,100

Run1 latches totaal versus run2 -- verschil en percentage
Run1 Run2 Verschil Pct
8,692 5,907 -2,785 147.15%


So alternative 2 is faster wall clock wise, but is less scalabe than alternative 5.

Now comparing 5 and 6:

rwijk@ORA11G> exec runstats_pkg.rs_stop(50)
Run1 draaide in 73 hsecs
Run2 draaide in 89 hsecs
Run1 draaide in 82,02% van de tijd

Naam Run1 Run2 Verschil
STAT.undo change vector size 2,868 2,804 -64
LATCH.SQL memory manager workarea list latch 74 9 -65
STAT.bytes received via SQL*Net from client 24,549 24,484 -65
LATCH.enqueues 2 76 74
LATCH.enqueue hash chains 3 78 75
STAT.redo size 3,876 3,784 -92
STAT.workarea executions - optimal 107 6 -101
STAT.sorts (memory) 102 1 -101
STAT.sorts (rows) 62,756 31,328 -31,428

Run1 latches totaal versus run2 -- verschil en percentage
Run1 Run2 Verschil Pct
4,700 4,811 111 97.69%


Conclusion is that Adrian's multiset solution is the close winner. The cartesian product solution will appeal more when the difference in intervals is small and the percentage of filtered rows is small. The model clause solution only wins in readability I guess, although some of you probably won't agree :-)

Monday, November 12, 2007

UPSERT ALL

A few weeks ago, I received a question from a BI-colleague who had a problem with a query using the model clause in a materialized view. He wanted to generate new sets of data according to some rules, but he got stuck. The query he used was huge: approximately 680 packed lines containing a model with 100+ rules. It used 12 dimension columns and operated on a table containing 90K rows.

The question was very interesting though and I’ll try to simulate it with a smaller table like this:

SQL> create table mytable (code, col1, col2, col3, value)
2 as
3 select 'MW001', 1, 1, 1, 11 from dual union all
4 select 'MW001', 1, 1, 2, 12 from dual union all
5 select 'MW001', 1, 1, 3, 13 from dual union all
6 select 'MW001', 1, 1, 4, 14 from dual union all
7 select 'MW002', 1, 1, 1, 21 from dual union all
8 select 'MW002', 1, 1, 2, 22 from dual union all
9 select 'MW002', 1, 1, 3, 23 from dual union all
10 select 'MW002', 1, 1, 4, 24 from dual union all
11 select 'MW003', 1, 1, 1, 31 from dual union all
12 select 'MW003', 1, 1, 2, 32 from dual union all
13 select 'MW003', 1, 1, 3, 33 from dual union all
14 select 'MW003', 1, 1, 4, 34 from dual union all
15 select 'MW004', 1, 1, 1, 41 from dual union all
16 select 'MW004', 1, 1, 2, 42 from dual union all
17 select 'MW004', 1, 1, 3, 43 from dual union all
18 select 'MW004', 1, 1, 4, 44 from dual union all
19 select 'MW005', 1, 1, 1, 51 from dual union all
20 select 'MW005', 1, 1, 2, 52 from dual union all
21 select 'MW005', 1, 1, 3, 53 from dual union all
22 select 'MW005', 1, 1, 4, 54 from dual union all
23 select 'MW011', 11, 1, 1, 11 from dual union all
24 select 'MW011', 11, 1, 2, 12 from dual union all
25 select 'MW011', 11, 1, 3, 13 from dual union all
26 select 'MW012', 11, 1, 1, 21 from dual union all
27 select 'MW012', 11, 1, 2, 22 from dual union all
28 select 'MW012', 11, 1, 3, 23 from dual union all
29 select 'MW013', 11, 1, 1, 31 from dual union all
30 select 'MW013', 11, 1, 2, 32 from dual union all
31 select 'MW013', 11, 1, 3, 33 from dual
32 /

Tabel is aangemaakt.


In this table there are 8 sets of data, identified by the column called code. The col1, col2 and col3 values identify the rows within a set. Together with the code column they can be considered the primary key. The purpose is to generate new sets of data, using the same key values col1, col2 and col3, but with a simple formula to calculate the value column. In the example I will use the following rules to simulate the original problem

MW006 == MW001 + MW002
MW007 == MW006 - MW003
MW008 == MW004 + MW005
MW009 == MW007 + MW008
MW014 == MW011 + MW012
MW015 == MW013 + MW014

These rules would have to lead to the following extra rows:

 CODE        COL1       COL2       COL3      VALUE
----- ---------- ---------- ---------- ----------
MW006 1 1 1 32
MW006 1 1 2 34
MW006 1 1 3 36
MW006 1 1 4 38
MW007 1 1 1 1
MW007 1 1 2 2
MW007 1 1 3 3
MW007 1 1 4 4
MW008 1 1 1 92
MW008 1 1 2 94
MW008 1 1 3 96
MW008 1 1 4 98
MW009 1 1 1 93
MW009 1 1 2 96
MW009 1 1 3 99
MW009 1 1 4 102
MW014 11 1 1 32
MW014 11 1 2 34
MW014 11 1 3 36
MW015 11 1 1 63
MW015 11 1 2 66
MW015 11 1 3 69


The value 32 in the required output’s first row is the result from 11 + 21, and 34 = 12 + 22 and so on. Note that the col1, col2 and col3 match exactly within each set.

If you look back at the create table statement, you will notice that are two set types MW001 – MW005, all having four identical key values, and MW011 – MW013 having three identical key values. Of course it is only possible to use sets within one set type in the rules. This is a precondition for this problem and not something we have to validate being true.

My colleague had read the documentation very carefully and used the same construction as recommended in it. Here is a link to that part of the documentation, just scroll down a little bit. He used the construct as described in the part about adding new sales for Poland, which resulted in this query:

SQL> select *
2 from mytable
3 model
4 return updated rows
5 dimension by (code, col1, col2, col3)
6 measures (value)
7 rules
8 ( value
9 [ for (code, col1, col2, col3) in
10 ( select distinct 'MW006', col1, col2, col3
11 from mytable
12 where code = 'MW001'
13 )
14 ] = value['MW001', cv(), cv(), cv()]
15 + value['MW002', cv(), cv(), cv()]
16 )
17 order by code
18 , col1
19 , col2
20 , col3
21 /

CODE COL1 COL2 COL3 VALUE
----- ---------- ---------- ---------- ----------
MW006 1 1 1 32
MW006 1 1 2 34
MW006 1 1 3 36
MW006 1 1 4 38

4 rijen zijn geselecteerd.


My colleague got stuck with this problem because he used something similar as below for the rule for MW007:

SQL> select *
2 from mytable
3 model
4 return updated rows
5 dimension by (code, col1, col2, col3)
6 measures (value)
7 rules
8 ( -- MW006
9 value
10 [ for (code, col1, col2, col3) in
11 ( select distinct 'MW006', col1, col2, col3
12 from mytable
13 where code = 'MW001'
14 )
15 ] = value['MW001', cv(), cv(), cv()]
16 + value['MW002', cv(), cv(), cv()]
17 , -- MW007
18 value
19 [ for (code, col1, col2, col3) in
20 ( select distinct 'MW007', col1, col2, col3
21 from mytable
22 where code = 'MW006'
23 )
24 ] = value['MW006', cv(), cv(), cv()]
25 - value['MW003', cv(), cv(), cv()]
26 )
27 order by code
28 , col1
29 , col2
30 , col3
31 /

CODE COL1 COL2 COL3 VALUE
----- ---------- ---------- ---------- ----------
MW006 1 1 1 32
MW006 1 1 2 34
MW006 1 1 3 36
MW006 1 1 4 38

4 rijen zijn geselecteerd.


And the four rows of set MW007 are not generated. It does not work because the model clause does not insert rows into the table: it only generates a result set. So selecting rows “where code = ‘MW006’” will never retrieve a row. So this line of thought is clearly wrong. To make this query work, it is absolutely necessary to know which base set should be used to generate the rows. If we change the query for the second rule to use the predicate “code = ‘MW001’” instead, the query works:

SQL> select *
2 from mytable
3 model
4 return updated rows
5 dimension by (code, col1, col2, col3)
6 measures (value)
7 rules
8 ( -- MW006
9 value
10 [ for (code, col1, col2, col3) in
11 ( select distinct 'MW006', col1, col2, col3
12 from mytable
13 where code = 'MW001'
14 )
15 ] = value['MW001', cv(), cv(), cv()]
16 + value['MW002', cv(), cv(), cv()]
17 , -- MW007
18 value
19 [ for (code, col1, col2, col3) in
20 ( select distinct 'MW007', col1, col2, col3
21 from mytable
22 where code = 'MW001'
23 )
24 ] = value['MW006', cv(), cv(), cv()]
25 - value['MW003', cv(), cv(), cv()]
26 )
27 order by code
28 , col1
29 , col2
30 , col3
31 /

CODE COL1 COL2 COL3 VALUE
----- ---------- ---------- ---------- ----------
MW006 1 1 1 32
MW006 1 1 2 34
MW006 1 1 3 36
MW006 1 1 4 38
MW007 1 1 1 1
MW007 1 1 2 2
MW007 1 1 3 3
MW007 1 1 4 4

8 rijen zijn geselecteerd.


But why does Oracle advice to use a separate query to generate a new set? It is a new query that has to be executed, and in the original case there would be 100+ of them. Now if it was the only way to achieve the result …, but it’s not: it’s what the UPSERT ALL clause is all about! In my experience the UPSERT ALL clause is rarely used, but this is a showcase. However, you have to know exactly how it works, or you might get surprised by this behaviour:

SQL> select *
2 from mytable
3 model
4 return updated rows
5 dimension by (code, col1, col2, col3)
6 measures (value)
7 rules upsert all
8 ( value ['MW006', col1, col2, col3]
9 = value['MW001', cv(), cv(), cv()]
10 + value['MW002', cv(), cv(), cv()]
11 )
12 order by code
13 , col1
14 , col2
15 , col3
16 /

CODE COL1 COL2 COL3 VALUE
----- ---------- ---------- ---------- ----------
MW006 1 1 1 32
MW006 1 1 2 34
MW006 1 1 3 36
MW006 1 1 4 38
MW006 11 1 1
MW006 11 1 2
MW006 11 1 3

7 rijen zijn geselecteerd.


Here 7 rows are selected instead of 4, because – as documented – all unique combinations that satisfy the non positional references, being col1, col2 and col3, are identified. Since I want only four I have to introduce an extra dimension value code2 based on code:

SQL> select *
2 from mytable
3 model
4 return updated rows
5 dimension by (code, code code2, col1, col2, col3)
6 measures (value)
7 rules upsert all
8 ( value ['MW006', code2 = 'MW001', col1, col2, col3]
9 = value['MW001', 'MW001', cv(), cv(), cv()]
10 + value['MW002', 'MW002', cv(), cv(), cv()]
11 )
12 order by code
13 , col1
14 , col2
15 , col3
16 /

CODE CODE2 COL1 COL2 COL3 VALUE
----- ----- ---------- ---------- ---------- ----------
MW006 MW001 1 1 1 32
MW006 MW001 1 1 2 34
MW006 MW001 1 1 3 36
MW006 MW001 1 1 4 38

4 rijen zijn geselecteerd.


For this solution it is also –just as the solution using a query per rule- absolutely necessary to know against which base set the rows should be generated. This is a tedious task in a situation with 100+ rules, but there is no way around that. At least a lot of inner queries are avoided using UPSERT ALL.

I ended up suggesting a third alternative. However, looking back on it, I am not at all happy with this advice because the third alternative doesn’t relieve the fact of having to know the base set. I guess I was too disappointed about not getting to the point of not having to know the base set, that I turned down the UPSERT ALL variant altogether.
The third alternative was this:

SQL> select *
2 from mytable
3 model
4 return updated rows
5 reference r on (select code, count(*) countcode from mytable group by code)
6 dimension by (code coderef)
7 measures (countcode)
8 main m
9 dimension by (code, row_number() over (partition by code order by col1, col2, col3) rn)
10 measures (col1, col2, col3, value)
11 rules
12 ( col1['MW006',for rn from 1 to countcode['MW001'] increment 1] = col1['MW001',cv()]
13 , col2['MW006',any] = col2['MW001',cv()]
14 , col3['MW006',any] = col3['MW001',cv()]
15 , value['MW006',any] = value['MW001',cv()] + value['MW002',cv()]
16 , col1['MW007',for rn from 1 to countcode['MW001'] increment 1] = col1['MW001',cv()]
17 , col2['MW007',any] = col2['MW001',cv()]
18 , col3['MW007',any] = col3['MW001',cv()]
19 , value['MW007',any] = value['MW006',cv()] - value['MW003',cv()]
20 , col1['MW008',for rn from 1 to countcode['MW004'] increment 1] = col1['MW001',cv()]
21 , col2['MW008',any] = col2['MW001',cv()]
22 , col3['MW008',any] = col3['MW001',cv()]
23 , value['MW008',any] = value['MW004',cv()] + value['MW005',cv()]
24 , col1['MW009',for rn from 1 to countcode['MW001'] increment 1] = col1['MW001',cv()]
25 , col2['MW009',any] = col2['MW001',cv()]
26 , col3['MW009',any] = col3['MW001',cv()]
27 , value['MW009',any] = value['MW007',cv()] + value['MW008',cv()]
28 , col1['MW014',for rn from 1 to countcode['MW011'] increment 1] = col1['MW011',cv()]
29 , col2['MW014',any] = col2['MW011',cv()]
30 , col3['MW014',any] = col3['MW011',cv()]
31 , value['MW014',any] = value['MW011',cv()] + value['MW012',cv()]
32 , col1['MW015',for rn from 1 to countcode['MW011'] increment 1] = col1['MW011',cv()]
33 , col2['MW015',any] = col2['MW011',cv()]
34 , col3['MW015',any] = col3['MW011',cv()]
35 , value['MW015',any] = value['MW013',cv()] + value['MW014',cv()]
36 )
37 order by code
38 , col1
39 , col2
40 , col3
41 /

CODE RN COL1 COL2 COL3 VALUE
----- ---------- ---------- ---------- ---------- ----------
MW006 1 1 1 1 32
MW006 2 1 1 2 34
MW006 3 1 1 3 36
MW006 4 1 1 4 38
MW007 1 1 1 1 1
MW007 2 1 1 2 2
MW007 3 1 1 3 3
MW007 4 1 1 4 4
MW008 1 1 1 1 92
MW008 2 1 1 2 94
MW008 3 1 1 3 96
MW008 4 1 1 4 98
MW009 1 1 1 1 93
MW009 2 1 1 2 96
MW009 3 1 1 3 99
MW009 4 1 1 4 102
MW014 1 11 1 1 32
MW014 2 11 1 2 34
MW014 3 11 1 3 36
MW015 1 11 1 1 63
MW015 2 11 1 2 66
MW015 3 11 1 3 69

22 rijen zijn geselecteerd.


The key point here is to use the FOR loop to generate the rows. To do so I need to index all entries by means of the ROW_NUMBER analytic function, and use a reference model to know how many rows to generate. It works, but it is more complex compared to the UPSERT ALL alternative and there are a lot more rules this way, because of the increase in measure values.

Luckily they turned down the third alternative, but they could have used the UPSERT ALL perfectly. The advice came too late however, so they headed towards a totally different approach. Very likely something which does not involve using materialized views with a model clause query.

The complete UPSERT ALL solution looks like this:

SQL> select *
2 from mytable
3 model
4 return updated rows
5 dimension by (code, code code2, col1, col2, col3)
6 measures (value)
7 rules upsert all
8 ( value ['MW006', code2 = 'MW001', col1, col2, col3]
9 = value['MW001', 'MW001', cv(), cv(), cv()]
10 + value['MW002', 'MW002', cv(), cv(), cv()]
11 , value ['MW007', code2 = 'MW001', col1, col2, col3]
12 = value['MW006', 'MW001', cv(), cv(), cv()]
13 - value['MW003', 'MW003', cv(), cv(), cv()]
14 , value ['MW008', code2 = 'MW001', col1, col2, col3]
15 = value['MW004', 'MW004', cv(), cv(), cv()]
16 + value['MW005', 'MW005', cv(), cv(), cv()]
17 , value ['MW009', code2 = 'MW001', col1, col2, col3]
18 = value['MW007', 'MW001', cv(), cv(), cv()]
19 + value['MW008', 'MW001', cv(), cv(), cv()]
20 , value ['MW014', code2 = 'MW011', col1, col2, col3]
21 = value['MW011', 'MW011', cv(), cv(), cv()]
22 + value['MW012', 'MW012', cv(), cv(), cv()]
23 , value ['MW015', code2 = 'MW011', col1, col2, col3]
24 = value['MW013', 'MW013', cv(), cv(), cv()]
25 + value['MW014', 'MW011', cv(), cv(), cv()]
26 )
27 order by code
28 , col1
29 , col2
30 , col3
31 /

CODE CODE2 COL1 COL2 COL3 VALUE
----- ----- ---------- ---------- ---------- ----------
MW006 MW001 1 1 1 32
MW006 MW001 1 1 2 34
MW006 MW001 1 1 3 36
MW006 MW001 1 1 4 38
MW007 MW001 1 1 1 1
MW007 MW001 1 1 2 2
MW007 MW001 1 1 3 3
MW007 MW001 1 1 4 4
MW008 MW001 1 1 1 92
MW008 MW001 1 1 2 94
MW008 MW001 1 1 3 96
MW008 MW001 1 1 4 98
MW009 MW001 1 1 1 93
MW009 MW001 1 1 2 96
MW009 MW001 1 1 3 99
MW009 MW001 1 1 4 102
MW014 MW011 11 1 1 32
MW014 MW011 11 1 2 34
MW014 MW011 11 1 3 36
MW015 MW011 11 1 1 63
MW015 MW011 11 1 2 66
MW015 MW011 11 1 3 69

22 rijen zijn geselecteerd.


Or maybe for sake of clarity you could separate the generating of new rows based on original base sets from the calculation of the new values like this:

SQL> select *
2 from ( select code, col1, col2, col3, value
3 from mytable
4 model
5 dimension by (code, code code2, col1, col2, col3)
6 measures (value)
7 rules upsert all
8 ( value['MW006', code2 = 'MW001', col1, col2, col3] = 0
9 , value['MW007', code2 = 'MW001', col1, col2, col3] = 0
10 , value['MW008', code2 = 'MW001', col1, col2, col3] = 0
11 , value['MW009', code2 = 'MW001', col1, col2, col3] = 0
12 , value['MW014', code2 = 'MW011', col1, col2, col3] = 0
13 , value['MW015', code2 = 'MW011', col1, col2, col3] = 0
14 )
15 )
16 model
17 return updated rows
18 dimension by (code, col1, col2, col3)
19 measures (value)
20 rules automatic order
21 ( value['MW006', col1, col2, col3]
22 = value['MW001', cv(), cv(), cv()]
23 + value['MW002', cv(), cv(), cv()]
24 , value['MW008', col1, col2, col3]
25 = value['MW004', cv(), cv(), cv()]
26 + value['MW005', cv(), cv(), cv()]
27 , value['MW015', col1, col2, col3]
28 = value['MW013', cv(), cv(), cv()]
29 + value['MW014', cv(), cv(), cv()]
30 , value['MW009', col1, col2, col3]
31 = value['MW007', cv(), cv(), cv()]
32 + value['MW008', cv(), cv(), cv()]
33 , value['MW014', col1, col2, col3]
34 = value['MW011', cv(), cv(), cv()]
35 + value['MW012', cv(), cv(), cv()]
36 , value['MW007', col1, col2, col3]
37 = value['MW006', cv(), cv(), cv()]
38 - value['MW003', cv(), cv(), cv()]
39 )
40 order by code
41 , col1
42 , col2
43 , col3
44 /

CODE COL1 COL2 COL3 VALUE
----- ---------- ---------- ---------- ----------
MW006 1 1 1 32
MW006 1 1 2 34
MW006 1 1 3 36
MW006 1 1 4 38
MW007 1 1 1 1
MW007 1 1 2 2
MW007 1 1 3 3
MW007 1 1 4 4
MW008 1 1 1 92
MW008 1 1 2 94
MW008 1 1 3 96
MW008 1 1 4 98
MW009 1 1 1 93
MW009 1 1 2 96
MW009 1 1 3 99
MW009 1 1 4 102
MW014 11 1 1 32
MW014 11 1 2 34
MW014 11 1 3 36
MW015 11 1 1 63
MW015 11 1 2 66
MW015 11 1 3 69

22 rijen zijn geselecteerd.


UPDATE


Today I found out he did use the UPSERT ALL and that my phrase "it is also <...> absolutely necessary to know against which base set the rows should be generated." is worded too strong.

He used this query:

SQL> select *
2 from ( select *
3 from mytable
4 model
5 return updated rows
6 dimension by (code, col1, col2, col3)
7 measures (value)
8 rules upsert all
9 ( value ['MW006', col1, col2, col3]
10 = case
11 when value['MW001', cv(), cv(), cv()] is not null
12 or value['MW002', cv(), cv(), cv()] is not null
13 then
14 nvl(value['MW001', cv(), cv(), cv()],0)
15 + nvl(value['MW002', cv(), cv(), cv()],0)
16 end
17 , value ['MW007', col1, col2, col3]
18 = case
19 when value['MW006', cv(), cv(), cv()] is not null
20 or value['MW003', cv(), cv(), cv()] is not null
21 then
22 nvl(value['MW006', cv(), cv(), cv()],0)
23 - nvl(value['MW003', cv(), cv(), cv()],0)
24 end
25 , value ['MW008', col1, col2, col3]
26 = case
27 when value['MW004', cv(), cv(), cv()] is not null
28 or value['MW005', cv(), cv(), cv()] is not null
29 then
30 nvl(value['MW004', cv(), cv(), cv()],0)
31 + nvl(value['MW005', cv(), cv(), cv()],0)
32 end
33 , value ['MW009', col1, col2, col3]
34 = case
35 when value['MW007', cv(), cv(), cv()] is not null
36 or value['MW008', cv(), cv(), cv()] is not null
37 then
38 nvl(value['MW007', cv(), cv(), cv()],0)
39 + nvl(value['MW008', cv(), cv(), cv()],0)
40 end
41 , value ['MW014', col1, col2, col3]
42 = case
43 when value['MW011', cv(), cv(), cv()] is not null
44 or value['MW012', cv(), cv(), cv()] is not null
45 then
46 nvl(value['MW011', cv(), cv(), cv()],0)
47 + nvl(value['MW012', cv(), cv(), cv()],0)
48 end
49 , value ['MW015', col1, col2, col3]
50 = case
51 when value['MW013', cv(), cv(), cv()] is not null
52 or value['MW014', cv(), cv(), cv()] is not null
53 then
54 nvl(value['MW013', cv(), cv(), cv()],0)
55 + nvl(value['MW014', cv(), cv(), cv()],0)
56 end
57 )
58 )
59 where value is not null
60 order by code
61 , col1
62 , col2
63 , col3
64 /

CODE COL1 COL2 COL3 VALUE
----- ---------- ---------- ---------- ----------
MW006 1 1 1 32
MW006 1 1 2 34
MW006 1 1 3 36
MW006 1 1 4 38
MW007 1 1 1 1
MW007 1 1 2 2
MW007 1 1 3 3
MW007 1 1 4 4
MW008 1 1 1 92
MW008 1 1 2 94
MW008 1 1 3 96
MW008 1 1 4 98
MW009 1 1 1 93
MW009 1 1 2 96
MW009 1 1 3 99
MW009 1 1 4 102
MW014 11 1 1 32
MW014 11 1 2 34
MW014 11 1 3 36
MW015 11 1 1 63
MW015 11 1 2 66
MW015 11 1 3 69

22 rijen zijn geselecteerd.


The big advantages are that it is now not necessary anymore to refer to an original set and compared to the solution using a query per rule, it is a lot faster.

However, it is slower than using the extra dimension and referring to the base set. A lot of unnecessary rows are generated that are filtered out at the end by the predicate "value is not null". It feels a lot like forgetting a join condition, resulting in having lots of duplicate rows, and filtering them out by using the DISTINCT keyword.

Very ugly, but it has its advantages.

UPDATE TWO


Volder has left a comment showing an alternative query. By putting the col1, col2 and col3 columns as a partition, the rules become very easy. And as can be seen in the motto of this blog, I really like simple solutions. His solution is this:

SQL> select *
2 from ( select code
3 , col1
4 , col2
5 , col3
6 , value
7 from mytable
8 model
9 return updated rows
10 partition by (col1, col2, col3)
11 dimension by (code)
12 measures(value)
13 rules
14 ( value['MW006'] = value['MW001'] + value['MW002']
15 , value['MW007'] = value['MW006'] - value['MW003']
16 , value['MW008'] = value['MW004'] + value['MW005']
17 , value['MW009'] = value['MW007'] + value['MW008']
18 , value['MW014'] = value['MW011'] + value['MW012']
19 , value['MW015'] = value['MW013'] + value['MW014']
20 )
21 )
22 where value is not null
23 order by code
24 , col1
25 , col2
26 , col3
27 /

CODE COL1 COL2 COL3 VALUE
----- ---------- ---------- ---------- ----------
MW006 1 1 1 32
MW006 1 1 2 34
MW006 1 1 3 36
MW006 1 1 4 38
MW007 1 1 1 1
MW007 1 1 2 2
MW007 1 1 3 3
MW007 1 1 4 4
MW008 1 1 1 92
MW008 1 1 2 94
MW008 1 1 3 96
MW008 1 1 4 98
MW009 1 1 1 93
MW009 1 1 2 96
MW009 1 1 3 99
MW009 1 1 4 102
MW014 11 1 1 32
MW014 11 1 2 34
MW014 11 1 3 36
MW015 11 1 1 63
MW015 11 1 2 66
MW015 11 1 3 69

22 rijen zijn geselecteerd.


The query creates seven partitions for the seven unique combinations of col1, col2 and col3. And for each of those partitions six new cells are calculated, resulting in 42 rows. By inlining the query and filtering the ones with value NULL, the query returns the 22 requested new rows.

So in essence it is the same the query that is currently being used, and which I described in the first update section. That query also generated too many rows - also 42 - that had to be filtered out in the outer where. However, you have to agree that Volder's query looks much much easier.

Personally I still like the query using UPSERT ALL and an extra dimension "code2" the most, because only the necessary rows are generated. The new one also feels like forgetting a join condition and using distinct. But Volder's query sure is superior on all aspects to the query currently being used.

Tuesday, November 6, 2007

RuleGen

Today I visited a session at Centraal Boekhuis by Toon Koppelaars about his new product RuleGen. In short, the product is capable of generating all code needed for the implementation of business rules.

The easy ones (attribute and tuple rules) should be implemented by the declarative integrity constraints offered by Oracle: primary key, unique key, foreign key, not null and check constraints. However, the more challenging static constraints: the entity and inter-entity constraints can be handled perfectly with RuleGen. Change event rules (if A happens, then you should do B) should be considered as business logic instead of static database constraints, and are best to be handled in a calling api.

Examples of the type of constraints that were shown in RuleGen, are:
- There can only be zero or one employee with job 'PRESIDENT'
- Each department with a PRESIDENT or MANAGER should have at least one ADMIN.

Although I have stated that database triggers are evil, the main disadvantages I stated there seem to be addressed by this framework:

1) Database triggers make things happen "automagically"

This does not apply here, since RuleGen only considers the static database constraints.

2) Database triggers are rarely implemented correct

Here the appropriate locking code is generated by a dbms_lock call. In the version shown it was optional, but it will soon be mandatory and it can be derived from the definition.

3) Database triggers slow down DML

This will still be true since row level database triggers are generated and thus the dreaded contextswitches have to be performed. However the trigger code itself seems to be relatively lightweight - I'll have to test this. And more important, multirow DML is capable of checking the rule only once instead of "for each row". Undoubtedly RuleGen will outperform CDM RuleFrame. In the near future, I hope to be able to investigate how well RuleGen performs compared to the api approach.

To be continued...


UPDATE



You can read the results of the performance test here: RuleGen: the test

Friday, November 2, 2007

AMIS session Oracle11g for Developers

Yesterday I visited AMIS "knowledge center event" about Oracle11g for Developers. They did a compressed version of the session two times earlier, as described here on their blog. Yesterday was an extended version. Almost all new features were discussed and demonstrated with illustrative example scripts.

The features I am most enthusiastic about are (for now):

  1. flashback data archive
  2. pivot/unpivot
  3. for updated skip locked
  4. the removal of the PLS-00436 implementation restriction I mentioned earlier here

Another lesson learned was why you shouldn't improvise by messing up EMP's data during a session like this ...

Anyway, thank you Lucas and Alex for a very interesting evening. And for the pizza and pasta of course :-)

Que faire quand votre requête est trop lente ?

No, I will keep on writing in English. But this is the title of a post on developpez.com by Pierre Forstmann. I stumbled upon the link while I was looking to find out if this blog is being picked up by google.

Pierre has translated an OTN-post of mine called "When your query takes too long ...". The post was originally meant for myself to address all OTN-questions where only a query is given with a question like "Please tell me how to tune this query". It basically tells them to start investigating by collecting some data from explain plan and tkprof. It is nice to see it in French, since I had six years of French lessons at school I am able to read it as well. And the fact that I know what is being said helps too of course.

I am wondering if more languages will follow :-)