I don't know about other countries, but here in the Netherlands every radio station seems to have its own top 50/100/500/2000 of hits. So I decided to go along with the trend and make my own top 3 ... of OTN forum posts that made me laugh in 2007:
1) How much a three year old DBA is worth?
2) PL/SQL New Development Standard: Frameworkia
3) Why Padfield earns twice as much as Robertson
Have fun reading and since this will be my last post for 2007: I wish all readers a happy and healthy 2008.
Pages
▼
Thursday, December 27, 2007
Monday, December 24, 2007
Check constraints with sysdate
Implementing a check constraint with sysdate in it, is not possible. Not until 11g that is, where you can use a trick to do this using virtual columns. Here is a small demo:
First, let's create a table and try to place a check constraint to check that a date should be a date in the past:
As expected, ORA-02436 (date or system variable wrongly specified in CHECK constraint) is returned.
Wrapping the check inside a function does not help:
Since a function is not allowed inside a check constraint. So let's add the function as a virtual column to the table:
Oracle is clever enough to know that you can only use deterministic functions for virtual columns. And a function using sysdate is never deterministic of course. Validating input of 25-12-2007 will give FALSE as a result, but tomorrow it will be TRUE. But let's try to fool Oracle by just saying the function is deterministic, where it is not:
It works! And now you can check whether mydate is a date in the past:
So we have fooled Oracle by saying the function is deterministic and we can reason that in this case it is not bad, because when a row satisfies the constraint, it will keep satisfying the constraint as time passes.
Let's check if the constraint works:
Great, but you have to be careful.
Watch what happens when checking whether a date is in the future. This cannot ever be a real constraint, as there will always be a time that a date will not be in the future anymore. But using the same technique as above it is possible:
So it is quite easy to misuse this feature. But when used carefully it is useful for implementing more business rule declaratively.
Previously, this type of constraint would be implemented using a database trigger. But we all know by now :-) that database triggers are evil. For those of us that don't know this yet, let's do a small performance test, to prove this once again, after dropping all previous objects:
I repeated the test several times, and got consistent results: using declarative constraints with virtual columns is approximately four times faster than using those dreaded database triggers.
First, let's create a table and try to place a check constraint to check that a date should be a date in the past:
rwijk@ORA11G> create table mytable
2 ( id number(4)
3 , mydate date
4 )
5 /
Tabel is aangemaakt.
rwijk@ORA11G> alter table mytable add constraint myck1 check (mydate < sysdate)
2 /
alter table mytable add constraint myck1 check (mydate < sysdate)
*
FOUT in regel 1:
.ORA-02436: Datum- of systeemvariabele is onjuist opgegeven in CHECK-beperking.
As expected, ORA-02436 (date or system variable wrongly specified in CHECK constraint) is returned.
Wrapping the check inside a function does not help:
rwijk@ORA11G> create function mydate_in_past
2 ( p_mydate in mytable.mydate%type
3 ) return varchar2
4 is
5 begin
6 return case when p_mydate >= sysdate then 'N' else 'Y' end;
7 end mydate_in_past;
8 /
Functie is aangemaakt.
rwijk@ORA11G> alter table mytable
2 add constraint myck1 check (mydate_in_past(mydate) = 'Y')
3 /
add constraint myck1 check (mydate_in_past(mydate) = 'Y')
*
FOUT in regel 2:
.ORA-00904: "MYDATE_IN_PAST": ongeldige ID
Since a function is not allowed inside a check constraint. So let's add the function as a virtual column to the table:
rwijk@ORA11G> alter table mytable
2 add (mydate_in_past_ind as (mydate_in_past(mydate)))
3 /
add (mydate_in_past_ind as (mydate_in_past(mydate)))
*
FOUT in regel 2:
.ORA-30553: De functie is niet deterministisch.
Oracle is clever enough to know that you can only use deterministic functions for virtual columns. And a function using sysdate is never deterministic of course. Validating input of 25-12-2007 will give FALSE as a result, but tomorrow it will be TRUE. But let's try to fool Oracle by just saying the function is deterministic, where it is not:
rwijk@ORA11G> create or replace function mydate_in_past
2 ( p_mydate in mytable.mydate%type
3 ) return varchar2 deterministic
4 is
5 begin
6 return case when p_mydate >= sysdate then 'N' else 'Y' end;
7 end mydate_in_past;
8 /
Functie is aangemaakt.
rwijk@ORA11G> alter table mytable
2 add (mydate_in_past_ind as (cast(mydate_in_past(mydate) as varchar2(1))))
3 /
Tabel is gewijzigd.
rwijk@ORA11G> desc mytable
Naam Null? Type
-------------------------------------------------------------------------- -------- --------------------
ID NUMBER(4)
MYDATE DATE
MYDATE_IN_PAST_IND VARCHAR2(1)
It works! And now you can check whether mydate is a date in the past:
rwijk@ORA11G> alter table mytable
2 add constraint myck1 check (mydate_in_past_ind = 'Y')
3 /
Tabel is gewijzigd.
So we have fooled Oracle by saying the function is deterministic and we can reason that in this case it is not bad, because when a row satisfies the constraint, it will keep satisfying the constraint as time passes.
Let's check if the constraint works:
rwijk@ORA11G> insert into mytable (id,mydate) values (1,date '2007-01-01')
2 /
1 rij is aangemaakt.
rwijk@ORA11G> insert into mytable (id,mydate) values (2,date '2020-01-01')
2 /
insert into mytable (id,mydate) values (2,date '2020-01-01')
*
FOUT in regel 1:
.ORA-02290: CHECK-beperking (RWIJK.MYCK1) is geschonden.
Great, but you have to be careful.
Watch what happens when checking whether a date is in the future. This cannot ever be a real constraint, as there will always be a time that a date will not be in the future anymore. But using the same technique as above it is possible:
rwijk@ORA11G> rollback
2 /
Rollback is voltooid.
rwijk@ORA11G> alter table mytable add (mydate2 date)
2 /
Tabel is gewijzigd.
rwijk@ORA11G> create function mydate_in_future
2 ( p_mydate in mytable.mydate%type
3 ) return varchar2 deterministic
4 is
5 begin
6 return case when p_mydate > sysdate then 'Y' else 'N' end;
7 end mydate_in_future;
8 /
Functie is aangemaakt.
rwijk@ORA11G> alter table mytable
2 add (mydate2_in_future_ind as (cast(mydate_in_future(mydate2) as varchar2(1))))
3 /
Tabel is gewijzigd.
rwijk@ORA11G> alter table mytable
2 add constraint myck2 check (mydate2_in_future_ind = 'Y')
3 /
Tabel is gewijzigd.
rwijk@ORA11G> desc mytable
Naam Null? Type
-------------------------------------------------------------------------- -------- -----------------
ID NUMBER(4)
MYDATE DATE
MYDATE_IN_PAST_IND VARCHAR2(1)
MYDATE2 DATE
MYDATE2_IN_FUTURE_IND VARCHAR2(1)
rwijk@ORA11G> insert into mytable (id,mydate2) values (1,date '2007-01-01')
2 /
insert into mytable (id,mydate2) values (1,date '2007-01-01')
*
FOUT in regel 1:
.ORA-02290: CHECK-beperking (RWIJK.MYCK2) is geschonden.
rwijk@ORA11G> insert into mytable (id,mydate2) values (2,date '2020-01-01')
2 /
1 rij is aangemaakt.
So it is quite easy to misuse this feature. But when used carefully it is useful for implementing more business rule declaratively.
Previously, this type of constraint would be implemented using a database trigger. But we all know by now :-) that database triggers are evil. For those of us that don't know this yet, let's do a small performance test, to prove this once again, after dropping all previous objects:
rwijk@ORA11G> create function mydate_in_past
2 ( p_mydate in date
3 ) return varchar2 deterministic
4 is
5 begin
6 return case when p_mydate >= sysdate then 'N' else 'Y' end;
7 end mydate_in_past;
8 /
Functie is aangemaakt.
rwijk@ORA11G> create table mytable_virtual_column
2 ( id number(6)
3 , mydate date
4 , mydate_in_past_ind as (cast(mydate_in_past(mydate) as varchar2(1)))
5 , constraint myck1 check (mydate_in_past_ind = 'Y')
6 )
7 /
Tabel is aangemaakt.
rwijk@ORA11G> create table mytable_database_trigger
2 ( id number(6)
3 , mydate date
4 )
5 /
Tabel is aangemaakt.
rwijk@ORA11G> create trigger mytrigger
2 before insert or update on mytable_database_trigger
3 for each row
4 begin
5 if inserting
6 or updating('mydate')
7 then
8 if :new.mydate >= sysdate
9 then
10 raise_application_error(-20000,'mydate must be a date in the past.');
11 end if;
12 end if;
13 end;
14 /
Trigger is aangemaakt.
rwijk@ORA11G> exec runstats_pkg.rs_start
PL/SQL-procedure is geslaagd.
rwijk@ORA11G> insert into mytable_virtual_column (id,mydate)
2 select level
3 , date '2007-01-01'
4 from dual
5 connect by level <= 100000
6 /
100000 rijen zijn aangemaakt.
rwijk@ORA11G> exec runstats_pkg.rs_middle
PL/SQL-procedure is geslaagd.
rwijk@ORA11G> insert into mytable_database_trigger (id,mydate)
2 select level
3 , date '2007-01-01'
4 from dual
5 connect by level <= 100000
6 /
100000 rijen zijn aangemaakt.
rwijk@ORA11G> exec runstats_pkg.rs_stop(100)
Run1 draaide in 42 hsecs
Run2 draaide in 172 hsecs
Run1 draaide in 24,42% van de tijd
Naam Run1 Run2 Verschil
STAT.DB time 43 168 125
STAT.CPU used when call started 42 169 127
STAT.CPU used by this session 39 169 130
STAT.redo size 2,458,396 2,458,560 164
LATCH.object queue header operation 773 994 221
STAT.free buffer inspected 0 250 250
Run1 latches totaal versus run2 -- verschil en percentage
Run1 Run2 Verschil Pct
15,027 15,528 501 96.77%
PL/SQL-procedure is geslaagd.
I repeated the test several times, and got consistent results: using declarative constraints with virtual columns is approximately four times faster than using those dreaded database triggers.
Friday, December 21, 2007
Simulating print_table with unpivot
With the advent of UNPIVOT in Oracle11g, a SQL only solution for printing table contents vertically has been made a little easier. I've used Tom Kyte's print_table function for this purpose occasionally. You can read about this function here on AskTom. The following SQL using unpivot is definitely not easier than issuing
Note that I have to convert all columns to varchar2's to be able to unpivot them. Also, the non default "including nulls" is used to explicitly generate rows when the COMM column is null.
exec print_table('select * from emp'), so it's just for fun:
rwijk@ORA11G> select decode
2 ( col
3 , 'LINE', '======================'
4 , rpad(col,8) || ': ' || value
5 ) "Contents of table EMP"
6 from ( select to_char(empno) empno
7 , ename
8 , job
9 , to_char(mgr) mgr
10 , to_char(hiredate,'dd-mm-yyyy') hiredate
11 , to_char(sal) sal
12 , to_char(comm) comm
13 , to_char(deptno) deptno
14 , '=' line
15 from emp
16 )
17 unpivot include nulls
18 (value for col in (empno,ename,job,mgr,hiredate,sal,comm,deptno,line))
19 /
Contents of table EMP
--------------------------------------------------
EMPNO : 7369
ENAME : SMITH
JOB : CLERK
MGR : 7902
HIREDATE: 17-12-1980
SAL : 800
COMM :
DEPTNO : 20
======================
EMPNO : 7499
ENAME : ALLEN
JOB : SALESMAN
MGR : 7698
HIREDATE: 20-02-1981
SAL : 1600
COMM : 300
DEPTNO : 30
======================
EMPNO : 7521
ENAME : WARD
JOB : SALESMAN
MGR : 7698
HIREDATE: 22-02-1981
SAL : 1250
COMM : 500
DEPTNO : 30
======================
EMPNO : 7566
ENAME : JONES
JOB : MANAGER
MGR : 7839
HIREDATE: 02-04-1981
SAL : 2975
COMM :
DEPTNO : 20
======================
EMPNO : 7654
ENAME : MARTIN
JOB : SALESMAN
MGR : 7698
HIREDATE: 28-09-1981
SAL : 1250
COMM : 1400
DEPTNO : 30
======================
EMPNO : 7698
ENAME : BLAKE
JOB : MANAGER
MGR : 7839
HIREDATE: 01-05-1981
SAL : 2850
COMM :
DEPTNO : 30
======================
EMPNO : 7782
ENAME : CLARK
JOB : MANAGER
MGR : 7839
HIREDATE: 09-06-1981
SAL : 2450
COMM :
DEPTNO : 10
======================
EMPNO : 7788
ENAME : SCOTT
JOB : ANALYST
MGR : 7566
HIREDATE: 09-12-1982
SAL : 3000
COMM :
DEPTNO : 20
======================
EMPNO : 7839
ENAME : KING
JOB : PRESIDENT
MGR :
HIREDATE: 17-11-1981
SAL : 5000
COMM :
DEPTNO : 10
======================
EMPNO : 7844
ENAME : TURNER
JOB : SALESMAN
MGR : 7698
HIREDATE: 08-09-1981
SAL : 1500
COMM : 0
DEPTNO : 30
======================
EMPNO : 7876
ENAME : ADAMS
JOB : CLERK
MGR : 7788
HIREDATE: 12-01-1983
SAL : 1100
COMM :
DEPTNO : 20
======================
EMPNO : 7900
ENAME : JAMES
JOB : CLERK
MGR : 7698
HIREDATE: 03-12-1981
SAL : 950
COMM :
DEPTNO : 30
======================
EMPNO : 7902
ENAME : FORD
JOB : ANALYST
MGR : 7566
HIREDATE: 03-12-1981
SAL : 3000
COMM :
DEPTNO : 20
======================
EMPNO : 7934
ENAME : MILLER
JOB : CLERK
MGR : 7782
HIREDATE: 23-01-1982
SAL : 1300
COMM :
DEPTNO : 10
======================
126 rijen zijn geselecteerd.
Note that I have to convert all columns to varchar2's to be able to unpivot them. Also, the non default "including nulls" is used to explicitly generate rows when the COMM column is null.
Saturday, December 15, 2007
Parallellism in a skip locked scenario
During the Oracle 11g for developers session, more specifically when I was talking about the FOR UPDATE SKIP LOCKED, I received a question about the possibility of implementing parallellism if a batch program used the FOR UPDATE SKIP LOCKED strategy. Would it work to just kick off N processes simultaneously without altering the code?
Normally you would have to group the workload somehow - probably using the NTILE analytic function - and assign a piece of the workload to each of the slaves. But using the FOR UPDATE SKIP LOCKED, if one of the processes locks some rows, the other processes would skip those rows, and lock some others, and so on. Interesting question and I didn't know the answer right away, so I just tested it.
Let's create a table with a processed flag. This one is necessary in a skip locked scenario as you need to be able to track down which rows were processed and which ones were not. You can handle the ones that were skipped next day or next week.
And create a procedure that processes all rows using the FOR UPDATE SKIP LOCKED clause:
Now let's run the procedure in a single process using dbms_job:
This is the result:
It takes 10 seconds.
Now let's try with 4 processes simultaneously, after recreating the entire scenario the same way as above:
And look: it works! All 4 processes did approximately 25,000 rows. However, instead of 1 time 10 seconds, you now consumed 4 times 11 or 12 seconds. Likely because all four processes are hitting the same resources at the same time, causing contention. So it is of no practical use, but still fun to know. And thanks to Tijmen for the interesting question.
Normally you would have to group the workload somehow - probably using the NTILE analytic function - and assign a piece of the workload to each of the slaves. But using the FOR UPDATE SKIP LOCKED, if one of the processes locks some rows, the other processes would skip those rows, and lock some others, and so on. Interesting question and I didn't know the answer right away, so I just tested it.
Let's create a table with a processed flag. This one is necessary in a skip locked scenario as you need to be able to track down which rows were processed and which ones were not. You can handle the ones that were skipped next day or next week.
rwijk@ORA11G> create table parallel_skiplocked_test
2 (id, procesnumber, processed_flag, amount, processed_timestamp)
3 as
4 select level
5 , cast(null as integer)
6 , 'N'
7 , cast(null as number(9))
8 , cast(null as timestamp)
9 from dual
10 connect by level <= 100000
11 /
Tabel is aangemaakt.
rwijk@ORA11G> alter table parallel_skiplocked_test add constraint pst_pk primary key (id)
2 /
Tabel is gewijzigd.
rwijk@ORA11G> exec dbms_stats.gather_table_stats(user,'parallel_skiplocked_test',cascade=>true)
PL/SQL-procedure is geslaagd.
And create a procedure that processes all rows using the FOR UPDATE SKIP LOCKED clause:
rwijk@ORA11G> create procedure process_all(p_procesnumber in integer)
2 is
3 cursor c_pst
4 is
5 select id
6 , amount
7 from parallel_skiplocked_test
8 where processed_flag = 'N'
9 for update skip locked
10 ;
11 type tr_pst is record
12 ( id parallel_skiplocked_test.id%type
13 , amount parallel_skiplocked_test.amount%type
14 );
15 type t_pst is table of tr_pst;
16 r_pst t_pst;
17 begin
18 open c_pst;
19 loop
20 fetch c_pst bulk collect into r_pst limit 100
21 ;
22 for i in 1..r_pst.count
23 loop
24 r_pst(i).amount := trunc(mod(10000 * i * dbms_random.value,10000));
25 end loop
26 ;
27 forall i in 1..r_pst.count
28 update parallel_skiplocked_test
29 set procesnumber = p_procesnumber
30 , amount = r_pst(i).amount
31 , processed_flag = 'Y'
32 , processed_timestamp = systimestamp
33 where id = r_pst(i).id
34 ;
35 exit when c_pst%notfound
36 ;
37 end loop;
38 close c_pst;
39 end process_all;
40 /
Procedure is aangemaakt.
Now let's run the procedure in a single process using dbms_job:
rwijk@ORA11G> declare
2 l_job_id1 binary_integer;
3 begin
4 dbms_job.submit(l_job_id1,'process_all(1);');
5 end;
6 /
PL/SQL-procedure is geslaagd.
rwijk@ORA11G> commit
2 /
Commit is voltooid.
This is the result:
rwijk@ORA11G> select procesnumber
2 , count(*)
3 , min(processed_timestamp) min_timestamp
4 , max(processed_timestamp) max_timestamp
5 from parallel_skiplocked_test
6 group by procesnumber
7 /
PROCESNUMBER COUNT(*) MIN_TIMESTAMP MAX_TIMESTAMP
------------ ---------- ------------------------- -------------------------
1 100000 15-12-07 01:04:20,734000 15-12-07 01:04:30,375000
1 rij is geselecteerd.
It takes 10 seconds.
Now let's try with 4 processes simultaneously, after recreating the entire scenario the same way as above:
rwijk@ORA11G> declare
2 l_job_id1 binary_integer;
3 l_job_id2 binary_integer;
4 l_job_id3 binary_integer;
5 l_job_id4 binary_integer;
6 begin
7 dbms_job.submit(l_job_id1,'process_all(1);');
8 dbms_job.submit(l_job_id2,'process_all(2);');
9 dbms_job.submit(l_job_id3,'process_all(3);');
10 dbms_job.submit(l_job_id4,'process_all(4);');
11 end;
12 /
PL/SQL-procedure is geslaagd.
rwijk@ORA11G> commit
2 /
Commit is voltooid.
rwijk@ORA11G> pause
rwijk@ORA11G> select procesnumber
2 , count(*)
3 , min(processed_timestamp) min_timestamp
4 , max(processed_timestamp) max_timestamp
5 from parallel_skiplocked_test
6 group by procesnumber
7 /
PROCESNUMBER COUNT(*) MIN_TIMESTAMP MAX_TIMESTAMP
------------ ---------- ------------------------- -------------------------
1 23555 15-12-07 01:05:05,734000 15-12-07 01:05:16,656000
2 25445 15-12-07 01:05:05,953000 15-12-07 01:05:17,703000
4 26400 15-12-07 01:05:05,968000 15-12-07 01:05:16,656000
3 24600 15-12-07 01:05:05,796000 15-12-07 01:05:16,640000
4 rijen zijn geselecteerd.
And look: it works! All 4 processes did approximately 25,000 rows. However, instead of 1 time 10 seconds, you now consumed 4 times 11 or 12 seconds. Likely because all four processes are hitting the same resources at the same time, causing contention. So it is of no practical use, but still fun to know. And thanks to Tijmen for the interesting question.
Friday, December 14, 2007
Several 11g oddities
In an earlier post about setting up journaling using flashback data archives I mentioned the risk of losing undo information when adding a constraint to the table. During the preparation for the session Oracle 11g for developers I discovered four oddities:
1) Result Cache:
In the concepts manual, there is a picture of the Result Cache in the SGA. The drawing tells us that the Result Cache is a separate memory area, just like you have the Shared Pool, Large Pool or Buffer Cache. However when executing the procedure dbms_result_cache.memory_report, the output shows this:
Which makes you believe that the Result Cache is part of the Shared Pool. So which one of the two is true? The important part for me to know is that it is in shared memory, but still the difference is strange.
2) Reference partitioning
Suppose the following situation: three tables in a master-detail-detail relationship. The master is range partitioned, the child table is reference partitioned, and the grandchild table is also reference partitioned. When trying to insert a row in the grandchild table that violates the foreign key you'd expect the rather well known error message ORA-02291, integrity constraint violated. The error message you receive however is this one:
(translation: inserted partition key does not map to any partition)
3) Fine Grained Dependency Tracking
This feature reduces the odds that your packages become invalid, by storing the dependencies at the column level instead of table level. It works - I've seen it - but I cannot see the column dependencies anywhere. The obvious place, USER_DEPENDENCIES, does not show any column level information. The documentation does not seem to mention an alternative place where this information is stored. Hopefully I overlooked something, and I would appreciate it if someone would tell me where to find it. And I don't mean some X$-table, but in the regular USER_... views.
4) Native compilation
This one was staring at my face when I developed the demonstration script, but I missed it. Only after I was being notified by a colleague watching the session, I saw it too. When compiling a procedure natively, the PLSQL_OPTIMIZE_LEVEL has to be at least 2 according to the documentation. So in my script I set it to 1, and show that a warning is generated and the procedure is NOT natively compiled. But when querying the USER_PLSQL_OBJECT_SETTINGS view, it says that it is natively compiled:
1) Result Cache:
In the concepts manual, there is a picture of the Result Cache in the SGA. The drawing tells us that the Result Cache is a separate memory area, just like you have the Shared Pool, Large Pool or Buffer Cache. However when executing the procedure dbms_result_cache.memory_report, the output shows this:
rwijk@ORA11G> exec dbms_result_cache.memory_report
R e s u l t C a c h e M e m o r y R e p o r t
[Parameters]
Block Size = 1K bytes
Maximum Cache Size = 1056K bytes (1056 blocks)
Maximum Result Size = 52K bytes (52 blocks)
[Memory]
Total Memory = 5140 bytes [0.003% of the Shared Pool]
... Fixed Memory = 5140 bytes [0.003% of the Shared Pool]
... Dynamic Memory = 0 bytes [0.000% of the Shared Pool]
Which makes you believe that the Result Cache is part of the Shared Pool. So which one of the two is true? The important part for me to know is that it is in shared memory, but still the difference is strange.
2) Reference partitioning
Suppose the following situation: three tables in a master-detail-detail relationship. The master is range partitioned, the child table is reference partitioned, and the grandchild table is also reference partitioned. When trying to insert a row in the grandchild table that violates the foreign key you'd expect the rather well known error message ORA-02291, integrity constraint violated. The error message you receive however is this one:
ORA-14400: De ingevoerde partitiesleutel wordt niet toegewezen aan een partitie.
(translation: inserted partition key does not map to any partition)
3) Fine Grained Dependency Tracking
This feature reduces the odds that your packages become invalid, by storing the dependencies at the column level instead of table level. It works - I've seen it - but I cannot see the column dependencies anywhere. The obvious place, USER_DEPENDENCIES, does not show any column level information. The documentation does not seem to mention an alternative place where this information is stored. Hopefully I overlooked something, and I would appreciate it if someone would tell me where to find it. And I don't mean some X$-table, but in the regular USER_... views.
4) Native compilation
This one was staring at my face when I developed the demonstration script, but I missed it. Only after I was being notified by a colleague watching the session, I saw it too. When compiling a procedure natively, the PLSQL_OPTIMIZE_LEVEL has to be at least 2 according to the documentation. So in my script I set it to 1, and show that a warning is generated and the procedure is NOT natively compiled. But when querying the USER_PLSQL_OBJECT_SETTINGS view, it says that it is natively compiled:
rwijk@ORA11G> alter session set plsql_optimize_level = 1
2 /
Sessie is gewijzigd.
rwijk@ORA11G> alter session set plsql_code_type = native
2 /
Sessie is gewijzigd.
rwijk@ORA11G> create procedure p2
2 is
3 begin
4 for i in 1..100000000
5 loop
6 null;
7 end loop
8 ;
9 dbms_output.put_line('p2');
10 end;
11 /
SP2-0804: Procedure aangemaakt met compilatiewaarschuwingen
rwijk@ORA11G> show err
Fouten voor PROCEDURE P2:
LINE/COL ERROR
-------- -----------------------------------------------------------------
0/0 PLW-06014: Het genereren van native code is uitgeschakeld omdat
PLSQL_OPTIMIZE_LEVEL <= 1.
rwijk@ORA11G> select name
2 , type
3 , plsql_code_type
4 , plsql_optimize_level
5 , plsql_warnings
6 from user_plsql_object_settings
7 where name like 'P_'
8 order by name
9 /
NAME TYPE PLSQL_CODE_TYPE PLSQL_OPTIMIZE_LEVEL PLSQL_WARNINGS
------------------------------ ------------ ------------------------------ -------------------- ------------------------------
P1 PROCEDURE INTERPRETED 2 ENABLE:ALL
P2 PROCEDURE NATIVE 1 ENABLE:ALL
P3 PROCEDURE NATIVE 2 ENABLE:ALL
Wednesday, December 12, 2007
Oracle 11g for developers
Today I will be doing an in-house knowledge session about Oracle 11g for Developers. The powerpoint presentation and the demonstration scripts can be downloaded here:
Powerpoint presentatie Oracle 11g voor ontwikkelaars
Bijbehorende demonstratiescripts
Warning: they are in Dutch.
Powerpoint presentatie Oracle 11g voor ontwikkelaars
Bijbehorende demonstratiescripts
Warning: they are in Dutch.
Friday, December 7, 2007
Multicolumn statistics
Of all new features of 11g, there are the ones that are talked about a lot, like the result cache and virtual columns. I haven't seen much about multicolumn statistics yet, apart from appearing in several bullet lists about 11g. However, I have seen several performance issues in the past, that could have benefitted from this feature, so I am quite excited about this one. The feature addresses what Wolfgang Breitling calls "Fallacy II - The Predicate Independence Assumption" in his paper Fallacies of the Cost Based Optimizer. Here is an example of how it works:
Let's create a fake order table:
And gather statistics normally.
Because the case expression in the create table statement may not read easily, I did these select statements to give you a better idea of what is in the table:
So 10 cities, 10 countries, with a clear dependency between the columns city and country. Although not exactly one-on-one, because - as everybody knows :-) - London is also a town in Ohio, USA. Don't get extracted by this fact too much; it won't play a role in the example.
Ok, so let's have a look at the cardinalities in the following query:
Since histograms are not present on the table yet, the cost based optimizer has predicted 10 rows, because the filter predicate city = 'London' has 10 distinct values, and the predicate country = 'UK' also has 10 distinct values. Because of the predicate independence assumption, the cost based optimizer calculates 1000 rows * (1/10) * (1/10) = 10 rows.
Also it doesn't matter which value I put in the query:
Even when searching for London in the Netherlands, it predicts 10. We all know that the query would of course return 0 rows, but the cost based optimizer does not.
So let's create histograms on the columns, to have better estimates:
Now the prediction is 250 rows. This is calculated based on the histogram information that tells the cost based optimizer that London occurs 500 times out of 1000 rows and the UK occurs 499 times out of 1000 rows. 1000 rows * (500/1000) * (499/1000) is approximately 250 rows.
And for the query using London in the Netherlands:
Now 50 rows are calculated, because 1000 rows * (500/1000) * (100/1000) = 50.
The above is the best you could get before 11g. With the use of multicolumn statistics, the row counts will be REALLY accurate:
499 exactly! And for London in the Netherlands:
Since the optimizer won't calculate with 0, it uses 1, but again very accurate! Much more precise cardinality estimates will lead to better decisions regarding whether to use an index or not, for example. It can really make a big difference in a production system. Problem queries can now be tuned more often by analyzing instead of adding hints to your code.
Let's create a fake order table:
rwijk@ORA11G> create table orders
2 as
3 select level nr
4 , 'Customer ' || to_char(level) customer_name
5 , case
6 when level <= 500 then 'London'
7 when level <= 550 then 'New York'
8 when level <= 600 then 'Paris'
9 when level <= 650 then 'Vienna'
10 when level <= 700 then 'Moscow'
11 when level <= 750 then 'Berlin'
12 when level <= 800 then 'Rome'
13 when level <= 850 then 'Madrid'
14 when level <= 900 then 'Brussels'
15 else 'Amsterdam'
16 end city
17 , case
18 when level < 500 then 'UK'
19 when level <= 550 then 'USA'
20 when level <= 600 then 'France'
21 when level <= 650 then 'Austria'
22 when level <= 700 then 'Russia'
23 when level <= 750 then 'Germany'
24 when level <= 800 then 'Italy'
25 when level <= 850 then 'Spain'
26 when level <= 900 then 'Belgium'
27 else 'Netherlands'
28 end country
29 from dual
30 connect by level <= 1000
31 /
Tabel is aangemaakt.
And gather statistics normally.
rwijk@ORA11G> exec dbms_stats.gather_table_stats(user,'orders')
PL/SQL-procedure is geslaagd.
Because the case expression in the create table statement may not read easily, I did these select statements to give you a better idea of what is in the table:
rwijk@ORA11G> select city
2 , count(*)
3 from orders
4 group by city
5 order by city
6 /
CITY COUNT(*)
--------- --------------------------------------
Amsterdam 100
Berlin 50
Brussels 50
London 500
Madrid 50
Moscow 50
New York 50
Paris 50
Rome 50
Vienna 50
10 rijen zijn geselecteerd.
rwijk@ORA11G> select country
2 , count(*)
3 from orders
4 group by country
5 order by country
6 /
COUNTRY COUNT(*)
----------- --------------------------------------
Austria 50
Belgium 50
France 50
Germany 50
Italy 50
Netherlands 100
Russia 50
Spain 50
UK 499
USA 51
10 rijen zijn geselecteerd.
rwijk@ORA11G> select city
2 , country
3 , count(*)
4 from orders
5 group by city
6 , country
7 order by city
8 , country
9 /
CITY COUNTRY COUNT(*)
--------- ----------- --------------------------------------
Amsterdam Netherlands 100
Berlin Germany 50
Brussels Belgium 50
London UK 499
London USA 1
Madrid Spain 50
Moscow Russia 50
New York USA 50
Paris France 50
Rome Italy 50
Vienna Austria 50
11 rijen zijn geselecteerd.
So 10 cities, 10 countries, with a clear dependency between the columns city and country. Although not exactly one-on-one, because - as everybody knows :-) - London is also a town in Ohio, USA. Don't get extracted by this fact too much; it won't play a role in the example.
Ok, so let's have a look at the cardinalities in the following query:
rwijk@ORA11G> explain plan
2 for
3 select *
4 from orders
5 where city = 'London'
6 and country = 'UK'
7 /
Uitleg is gegeven.
rwijk@ORA11G> select * from table(dbms_xplan.display)
2 /
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------
Plan hash value: 1275100350
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 290 | 4 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| ORDERS | 10 | 290 | 4 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("CITY"='London' AND "COUNTRY"='UK')
13 rijen zijn geselecteerd.
Since histograms are not present on the table yet, the cost based optimizer has predicted 10 rows, because the filter predicate city = 'London' has 10 distinct values, and the predicate country = 'UK' also has 10 distinct values. Because of the predicate independence assumption, the cost based optimizer calculates 1000 rows * (1/10) * (1/10) = 10 rows.
Also it doesn't matter which value I put in the query:
rwijk@ORA11G> exec dbms_lock.sleep(1)
PL/SQL-procedure is geslaagd.
rwijk@ORA11G> explain plan
2 for
3 select *
4 from orders
5 where city = 'London'
6 and country = 'Netherlands'
7 /
Uitleg is gegeven.
rwijk@ORA11G> select * from table(dbms_xplan.display)
2 /
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------
Plan hash value: 1275100350
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 290 | 4 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| ORDERS | 10 | 290 | 4 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("CITY"='London' AND "COUNTRY"='Netherlands')
13 rijen zijn geselecteerd.
Even when searching for London in the Netherlands, it predicts 10. We all know that the query would of course return 0 rows, but the cost based optimizer does not.
So let's create histograms on the columns, to have better estimates:
rwijk@ORA11G> exec dbms_stats.gather_table_stats(user,'orders',
method_opt=>'FOR ALL COLUMNS')
PL/SQL-procedure is geslaagd.
rwijk@ORA11G> explain plan
2 for
3 select *
4 from orders
5 where city = 'London'
6 and country = 'UK'
7 /
Uitleg is gegeven.
rwijk@ORA11G> select * from table(dbms_xplan.display)
2 /
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------
Plan hash value: 1275100350
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 250 | 7250 | 4 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| ORDERS | 250 | 7250 | 4 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("COUNTRY"='UK' AND "CITY"='London')
13 rijen zijn geselecteerd.
Now the prediction is 250 rows. This is calculated based on the histogram information that tells the cost based optimizer that London occurs 500 times out of 1000 rows and the UK occurs 499 times out of 1000 rows. 1000 rows * (500/1000) * (499/1000) is approximately 250 rows.
And for the query using London in the Netherlands:
rwijk@ORA11G> exec dbms_lock.sleep(1)
PL/SQL-procedure is geslaagd.
rwijk@ORA11G> explain plan
2 for
3 select *
4 from orders
5 where city = 'London'
6 and country = 'Netherlands'
7 /
Uitleg is gegeven.
rwijk@ORA11G> select * from table(dbms_xplan.display)
2 /
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------
Plan hash value: 1275100350
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 50 | 1450 | 4 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| ORDERS | 50 | 1450 | 4 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("COUNTRY"='Netherlands' AND "CITY"='London')
13 rijen zijn geselecteerd.
Now 50 rows are calculated, because 1000 rows * (500/1000) * (100/1000) = 50.
The above is the best you could get before 11g. With the use of multicolumn statistics, the row counts will be REALLY accurate:
rwijk@ORA11G> exec dbms_stats.gather_table_stats(user,'orders',
method_opt=>'FOR COLUMNS (city,country)')
PL/SQL-procedure is geslaagd.
rwijk@ORA11G> pause
rwijk@ORA11G> explain plan
2 for
3 select *
4 from orders
5 where city = 'London'
6 and country = 'UK'
7 /
Uitleg is gegeven.
rwijk@ORA11G> select * from table(dbms_xplan.display)
2 /
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------
Plan hash value: 1275100350
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 499 | 15469 | 4 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| ORDERS | 499 | 15469 | 4 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("COUNTRY"='UK' AND "CITY"='London')
13 rijen zijn geselecteerd.
499 exactly! And for London in the Netherlands:
rwijk@ORA11G> exec dbms_lock.sleep(1)
PL/SQL-procedure is geslaagd.
rwijk@ORA11G> explain plan
2 for
3 select *
4 from orders
5 where city = 'London'
6 and country = 'Netherlands'
7 /
Uitleg is gegeven.
rwijk@ORA11G> select * from table(dbms_xplan.display)
2 /
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------
Plan hash value: 1275100350
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 31 | 4 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| ORDERS | 1 | 31 | 4 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("COUNTRY"='Netherlands' AND "CITY"='London')
13 rijen zijn geselecteerd.
Since the optimizer won't calculate with 0, it uses 1, but again very accurate! Much more precise cardinality estimates will lead to better decisions regarding whether to use an index or not, for example. It can really make a big difference in a production system. Problem queries can now be tuned more often by analyzing instead of adding hints to your code.
Monday, November 26, 2007
Setting up journaling using flashback data archive
One of 11g new features is Flashback Data Archive, or in marketing terms: Oracle Total Recall. With earlier flashback technologies it was possible to access Oracle's undo information for as long as it was available. With flashback archives you can extend this period for as long as you specify. This makes it possible to implement journaling using this new technique. This blog entry describes how to do this and a number of things you need to be aware of.
Before 11g, journaling was typically implemented using a separate table with the same columns as the base table, plus three additional columns: jn_operation (Insert/Update/Delete), jn_user and jn_datetime. Three after row triggers on the base table made sure every DML was recorded in the journaling table. So let's see how the same can be achieved using flashback archives.
First of all I create a tablespace especially for flashback archives:
In the tablespace I can define several flashback archives each with its own retention period. For example:
And I create a table my_emp and attach it to the flashback archive with the 10 year retention period:
The range partitioning is not required; I only do this to show one point later on in this blog entry.
The information about the flashback archives is available in the views DBA_FLASHBACK_ARCHIVE, DBA_FLASHBACK_ARCHIVE_TS and DBA_FLASHBACK_ARCHIVE_TABLES:
Now let's do some DML against this new table:
And use a Flashback Version Query to retrieve all information:
So indeed all journaling information is available here. All, but one: the user who modified the data is not there. So let's audit the table to get this piece of information as well and create a journaling view to join the audit information with the journaling info.
And do some more transactions, including one transaction in another session by a user called SOMEBODY_ELSE:
And look, the missing piece of information is also here:
Note that I also used the SCN in the view, to be able to sort even when the transactions occur within one second.
So it looks very promising... well, until you read this warning in the documentation
Now this is a serious showstopper: in my experience the tables that need journaling are also the tables that require regular maintenance. And if that maintenance would invalidate existing undo ...
Let's test it (and here is why I range partitioned the table):
So Oracle does not even allow us to do this type of actions, and the existing undo is not harmed:
Most of the time, maintenance is done by adding one or two columns, with some additional constraints. The adding of a column is not a problem for flashback archives, but adding constraints was also listed as being impossible. Let's verify that as well, by adding a primary key, unique key, foreign key and a check constraint:
So, the documentation was wrong? Is it possible after all?
No.
Because the undo has been sneakily invalidated:
So if you make the mistake of adding a constraint, you lose all journaling information... This sure is something to be aware of. I think the adding of constraints to a table with a flashback archive should also raise an ORA-55610. Hopefully Oracle will fix this in a future release or patch set.
The only way I can think of to keep journaling information while still being able to do the necessary maintenance, is to create a journaling table in which you dump the information from the journaling view before invalidating the existing undo. Far from ideal, but doable as a workaround.
But what do we gain by implementing journaling this way?
The answer is simple: performance. No longer is it necessary to have database triggers. And because database triggers are evil, you win a lot, as can be seen in a little test in which I create 3 tables, one regular table, one table using a flashback archive and auditing, and one regular table with an after row trigger to populate a second journaling table:
As expected, almost no overhead using flashback data archives and therefore a lot faster than using the old fashioned way of journaling. But the maintenance thing described earlier is something to worry about in advance.
Further reading:
Yuri van Buren about Flashback Data Archive
Oracle 11g Total Recall - Flashback in the hands of Database Designers and Application Developers, at last. And: the end of Journalling Tables by Lucas Jellema
Before 11g, journaling was typically implemented using a separate table with the same columns as the base table, plus three additional columns: jn_operation (Insert/Update/Delete), jn_user and jn_datetime. Three after row triggers on the base table made sure every DML was recorded in the journaling table. So let's see how the same can be achieved using flashback archives.
First of all I create a tablespace especially for flashback archives:
rwijk@ORA11G> create tablespace my_tablespace datafile 'extra_file.dat' size 10M
2 /
Tabelruimte is aangemaakt.
In the tablespace I can define several flashback archives each with its own retention period. For example:
rwijk@ORA11G> create flashback archive flashback_archive_10_years
2 tablespace my_tablespace
3 retention 10 year
4 /
Flashbackarchief is aangemaakt.
rwijk@ORA11G> create flashback archive flashback_archive_5_years
2 tablespace my_tablespace
3 retention 5 year
4 /
Flashbackarchief is aangemaakt.
And I create a table my_emp and attach it to the flashback archive with the 10 year retention period:
rwijk@ORA11G> create table my_emp
2 ( empno number(4) not null
3 , ename varchar2(10)
4 , sal number(7,2)
5 , deptno number(2)
6 ) partition by range (deptno)
7 ( partition deptno10 values less than (11)
8 , partition deptno20 values less than (21)
9 , partition deptno30 values less than (31)
10 , partition deptnorest values less than (maxvalue)
11 ) flashback archive flashback_archive_10_years
12 /
Tabel is aangemaakt.
The range partitioning is not required; I only do this to show one point later on in this blog entry.
The information about the flashback archives is available in the views DBA_FLASHBACK_ARCHIVE, DBA_FLASHBACK_ARCHIVE_TS and DBA_FLASHBACK_ARCHIVE_TABLES:
rwijk@ORA11G> select * from dba_flashback_archive
2 /
FLASHBACK_ARCHIVE_NAME FLASHBACK_ARCHIVE# RETENTION_IN_DAYS
-------------------------- ------------------ -----------------
CREATE_TIME LAST_PURGE_TIME STATUS
------------------------------ ------------------------------ -------
FLASHBACK_ARCHIVE_5_YEARS 2 1825
27-11-07 23:53:51,000000000 27-11-07 23:53:51,000000000
FLASHBACK_ARCHIVE_10_YEARS 1 3650
27-11-07 23:53:51,000000000 27-11-07 23:53:51,000000000
2 rijen zijn geselecteerd.
rwijk@ORA11G> select * from dba_flashback_archive_ts
2 /
FLASHBACK_ARCHIVE_NAME FLASHBACK_ARCHIVE# TABLESPACE_NAME
-------------------------- ------------------ ------------------------------
QUOTA_IN_MB
----------------------------------------
FLASHBACK_ARCHIVE_5_YEARS 2 MY_TABLESPACE
FLASHBACK_ARCHIVE_10_YEARS 1 MY_TABLESPACE
2 rijen zijn geselecteerd.
rwijk@ORA11G> select * from dba_flashback_archive_tables
2 /
TABLE_NAME OWNER_NAME
------------------------------ ------------------------------
FLASHBACK_ARCHIVE_NAME ARCHIVE_TABLE_NAME
-------------------------- ------------------------------
MY_EMP RWIJK
FLASHBACK_ARCHIVE_10_YEARS SYS_FBA_HIST_71097
1 rij is geselecteerd.
Now let's do some DML against this new table:
rwijk@ORA11G> insert into my_emp
2 select empno, ename, sal, deptno from emp
3 /
14 rijen zijn aangemaakt.
rwijk@ORA11G> commit
2 /
Commit is voltooid.
rwijk@ORA11G> update my_emp
2 set sal = sal * 1.1
3 where deptno = 20
4 /
5 rijen zijn bijgewerkt.
rwijk@ORA11G> commit
2 /
Commit is voltooid.
rwijk@ORA11G> update my_emp
2 set ename = initcap(ename)
3 where deptno != 30
4 /
8 rijen zijn bijgewerkt.
rwijk@ORA11G> commit
2 /
Commit is voltooid.
rwijk@ORA11G> insert into my_emp (empno,ename,sal,deptno)
2 values (7777,'VAN WIJK', 1600, 40)
3 /
1 rij is aangemaakt.
rwijk@ORA11G> delete my_emp
2 where sal = 3300
3 /
2 rijen zijn verwijderd.
rwijk@ORA11G> commit
2 /
Commit is voltooid.
And use a Flashback Version Query to retrieve all information:
rwijk@ORA11G> select versions_xid
2 , versions_startscn
3 , versions_starttime
4 , versions_endscn
5 , versions_endtime
6 , versions_operation
7 , empno
8 , ename
9 , sal
10 , deptno
11 from my_emp versions between scn minvalue and maxvalue
12 order by empno
13 , versions_startscn
14 /
VERSIONS_XID VERSIONS_STARTSCN VERSIONS_STARTTIME VERSIONS_ENDSCN
---------------- ----------------- -------------------- ---------------
VERSIONS_ENDTIME V EMPNO ENAME SAL DEPTNO
-------------------- - ----- ---------- ----- ------
0A00030079030000 1589358 27-11-07 23:58:39 1589361
27-11-07 23:58:39 I 7369 SMITH 800 20
0400070075030000 1589361 27-11-07 23:58:39 1589365
27-11-07 23:58:39 U 7369 SMITH 880 20
0700110078030000 1589365 27-11-07 23:58:39
U 7369 Smith 880 20
0A00030079030000 1589358 27-11-07 23:58:39
I 7499 ALLEN 1600 30
0A00030079030000 1589358 27-11-07 23:58:39
I 7521 WARD 1250 30
0A00030079030000 1589358 27-11-07 23:58:39 1589361
27-11-07 23:58:39 I 7566 JONES 2975 20
0400070075030000 1589361 27-11-07 23:58:39 1589365
27-11-07 23:58:39 U 7566 JONES 3273 20
0700110078030000 1589365 27-11-07 23:58:39
U 7566 Jones 3273 20
0A00030079030000 1589358 27-11-07 23:58:39
I 7654 MARTIN 1250 30
0A00030079030000 1589358 27-11-07 23:58:39
I 7698 BLAKE 2850 30
09001C00B5040000 1589368 27-11-07 23:58:39
I 7777 VAN WIJK 1600 40
0A00030079030000 1589358 27-11-07 23:58:39 1589365
27-11-07 23:58:39 I 7782 CLARK 2450 10
0700110078030000 1589365 27-11-07 23:58:39
U 7782 Clark 2450 10
0A00030079030000 1589358 27-11-07 23:58:39 1589361
27-11-07 23:58:39 I 7788 SCOTT 3000 20
0400070075030000 1589361 27-11-07 23:58:39 1589365
27-11-07 23:58:39 U 7788 SCOTT 3300 20
0700110078030000 1589365 27-11-07 23:58:39 1589368
27-11-07 23:58:39 U 7788 Scott 3300 20
09001C00B5040000 1589368 27-11-07 23:58:39
D 7788 Scott 3300 20
0A00030079030000 1589358 27-11-07 23:58:39 1589365
27-11-07 23:58:39 I 7839 KING 5000 10
0700110078030000 1589365 27-11-07 23:58:39
U 7839 King 5000 10
0A00030079030000 1589358 27-11-07 23:58:39
I 7844 TURNER 1500 30
0A00030079030000 1589358 27-11-07 23:58:39 1589361
27-11-07 23:58:39 I 7876 ADAMS 1100 20
0400070075030000 1589361 27-11-07 23:58:39 1589365
27-11-07 23:58:39 U 7876 ADAMS 1210 20
0700110078030000 1589365 27-11-07 23:58:39
U 7876 Adams 1210 20
0A00030079030000 1589358 27-11-07 23:58:39
I 7900 JAMES 950 30
0A00030079030000 1589358 27-11-07 23:58:39 1589361
27-11-07 23:58:39 I 7902 FORD 3000 20
0400070075030000 1589361 27-11-07 23:58:39 1589365
27-11-07 23:58:39 U 7902 FORD 3300 20
0700110078030000 1589365 27-11-07 23:58:39 1589368
27-11-07 23:58:39 U 7902 Ford 3300 20
09001C00B5040000 1589368 27-11-07 23:58:39
D 7902 Ford 3300 20
0A00030079030000 1589358 27-11-07 23:58:39 1589365
27-11-07 23:58:39 I 7934 MILLER 1300 10
0700110078030000 1589365 27-11-07 23:58:39
U 7934 Miller 1300 10
30 rijen zijn geselecteerd.
So indeed all journaling information is available here. All, but one: the user who modified the data is not there. So let's audit the table to get this piece of information as well and create a journaling view to join the audit information with the journaling info.
rwijk@ORA11G> audit insert,update,delete on my_emp by access
2 /
Audit is geslaagd.
rwijk@ORA11G> create view my_emp_jn_v
2 as
3 select me.versions_operation jn_operation
4 , ao.username jn_user
5 , me.versions_starttime jn_datetime
6 , me.versions_startscn jn_scn
7 , me.empno
8 , me.ename
9 , me.sal
10 , me.deptno
11 from my_emp versions between scn minvalue and maxvalue me
12 , user_audit_object ao
13 where me.versions_xid = ao.transactionid (+)
14 /
View is aangemaakt.
And do some more transactions, including one transaction in another session by a user called SOMEBODY_ELSE:
rwijk@ORA11G> insert into my_emp (empno,ename,sal,deptno)
2 values (7778, 'ROB', 1700, 40)
3 /
1 rij is aangemaakt.
rwijk@ORA11G> commit
2 /
Commit is voltooid.
rwijk@ORA11G> create user somebody_else identified by somebody_else
2 /
Gebruiker is aangemaakt.
rwijk@ORA11G> grant create session to somebody_else
2 /
Toekennen is geslaagd.
rwijk@ORA11G> grant delete on my_emp to somebody_else
2 /
Toekennen is geslaagd.
rwijk@ORA11G> remark Start another session and connect as somebody_else
rwijk@ORA11G> remark and issue the following commands:
rwijk@ORA11G> remark
rwijk@ORA11G> remark delete rwijk.my_emp where empno = 7777
rwijk@ORA11G> remark /
rwijk@ORA11G> remark commit
rwijk@ORA11G> remark /
rwijk@ORA11G> remark exit
rwijk@ORA11G> pause
rwijk@ORA11G> drop user somebody_else
2 /
Gebruiker is verwijderd.
rwijk@ORA11G> update my_emp
2 set sal = 1800
3 where empno = 7778
4 /
1 rij is bijgewerkt.
rwijk@ORA11G> commit
2 /
Commit is voltooid.
And look, the missing piece of information is also here:
rwijk@ORA11G> select *
2 from my_emp_jn_v
3 order by empno
4 , jn_scn
5 /
J JN_USER JN_DATETIME JN_SCN EMPNO ENAME SAL DEPTNO
- ------------- ------------------- -------- ----- ---------- ----- ------
I 28-11-07 00:05:03 1590771 7369 SMITH 800 20
U 28-11-07 00:05:03 1590774 7369 SMITH 880 20
U 28-11-07 00:05:03 1590778 7369 Smith 880 20
I 28-11-07 00:05:03 1590771 7499 ALLEN 1600 30
I 28-11-07 00:05:03 1590771 7521 WARD 1250 30
I 28-11-07 00:05:03 1590771 7566 JONES 2975 20
U 28-11-07 00:05:03 1590774 7566 JONES 3273 20
U 28-11-07 00:05:03 1590778 7566 Jones 3273 20
I 28-11-07 00:05:03 1590771 7654 MARTIN 1250 30
I 28-11-07 00:05:03 1590771 7698 BLAKE 2850 30
I 28-11-07 00:05:03 1590781 7777 VAN WIJK 1600 40
D SOMEBODY_ELSE 28-11-07 00:05:21 1590838 7777 VAN WIJK 1600 40
I RWIJK 28-11-07 00:05:06 1590799 7778 ROB 1700 40
U RWIJK 28-11-07 00:05:33 1590859 7778 ROB 1800 40
I 28-11-07 00:05:03 1590771 7782 CLARK 2450 10
U 28-11-07 00:05:03 1590778 7782 Clark 2450 10
I 28-11-07 00:05:03 1590771 7788 SCOTT 3000 20
U 28-11-07 00:05:03 1590774 7788 SCOTT 3300 20
U 28-11-07 00:05:03 1590778 7788 Scott 3300 20
D 28-11-07 00:05:03 1590781 7788 Scott 3300 20
I 28-11-07 00:05:03 1590771 7839 KING 5000 10
U 28-11-07 00:05:03 1590778 7839 King 5000 10
I 28-11-07 00:05:03 1590771 7844 TURNER 1500 30
I 28-11-07 00:05:03 1590771 7876 ADAMS 1100 20
U 28-11-07 00:05:03 1590774 7876 ADAMS 1210 20
U 28-11-07 00:05:03 1590778 7876 Adams 1210 20
I 28-11-07 00:05:03 1590771 7900 JAMES 950 30
I 28-11-07 00:05:03 1590771 7902 FORD 3000 20
U 28-11-07 00:05:03 1590774 7902 FORD 3300 20
U 28-11-07 00:05:03 1590778 7902 Ford 3300 20
D 28-11-07 00:05:03 1590781 7902 Ford 3300 20
I 28-11-07 00:05:03 1590771 7934 MILLER 1300 10
U 28-11-07 00:05:03 1590778 7934 Miller 1300 10
33 rijen zijn geselecteerd.
Note that I also used the SCN in the view, to be able to sort even when the transactions occur within one second.
So it looks very promising... well, until you read this warning in the documentation
Remember that DDLs that alter the structure of a table (such as drop/modify column, move table, drop partition, truncate table/partition, and add constraint) invalidate any existing undo data for the table. If you try to retrieve data from a time before such a DDL executed, you will get error ORA-1466.
Now this is a serious showstopper: in my experience the tables that need journaling are also the tables that require regular maintenance. And if that maintenance would invalidate existing undo ...
Let's test it (and here is why I range partitioned the table):
rwijk@ORA11G> alter table my_emp drop column comm
2 /
alter table my_emp drop column comm
*
FOUT in regel 1:
.ORA-55610: Ongeldig DDL-statement op tabel met historietracking
rwijk@ORA11G> alter table my_emp modify (ename varchar2(50))
2 /
alter table my_emp modify (ename varchar2(50))
*
FOUT in regel 1:
.ORA-55610: Ongeldig DDL-statement op tabel met historietracking
rwijk@ORA11G> alter table my_emp drop partition deptno10
2 /
alter table my_emp drop partition deptno10
*
FOUT in regel 1:
.ORA-55610: Ongeldig DDL-statement op tabel met historietracking
rwijk@ORA11G> truncate table my_emp
2 /
truncate table my_emp
*
FOUT in regel 1:
.ORA-55610: Ongeldig DDL-statement op tabel met historietracking
So Oracle does not even allow us to do this type of actions, and the existing undo is not harmed:
rwijk@ORA11G> select *
2 from my_emp_jn_v
3 order by empno
4 , jn_scn
5 /
J JN_USER JN_DATETIME JN_SCN EMPNO ENAME SAL DEPTNO
- ------------- ------------------- -------- ----- ---------- ----- ------
I 28-11-07 00:05:03 1590771 7369 SMITH 800 20
U 28-11-07 00:05:03 1590774 7369 SMITH 880 20
U 28-11-07 00:05:03 1590778 7369 Smith 880 20
I 28-11-07 00:05:03 1590771 7499 ALLEN 1600 30
I 28-11-07 00:05:03 1590771 7521 WARD 1250 30
I 28-11-07 00:05:03 1590771 7566 JONES 2975 20
U 28-11-07 00:05:03 1590774 7566 JONES 3273 20
U 28-11-07 00:05:03 1590778 7566 Jones 3273 20
I 28-11-07 00:05:03 1590771 7654 MARTIN 1250 30
I 28-11-07 00:05:03 1590771 7698 BLAKE 2850 30
I 28-11-07 00:05:03 1590781 7777 VAN WIJK 1600 40
D SOMEBODY_ELSE 28-11-07 00:05:21 1590838 7777 VAN WIJK 1600 40
I RWIJK 28-11-07 00:05:06 1590799 7778 ROB 1700 40
U RWIJK 28-11-07 00:05:33 1590859 7778 ROB 1800 40
I 28-11-07 00:05:03 1590771 7782 CLARK 2450 10
U 28-11-07 00:05:03 1590778 7782 Clark 2450 10
I 28-11-07 00:05:03 1590771 7788 SCOTT 3000 20
U 28-11-07 00:05:03 1590774 7788 SCOTT 3300 20
U 28-11-07 00:05:03 1590778 7788 Scott 3300 20
D 28-11-07 00:05:03 1590781 7788 Scott 3300 20
I 28-11-07 00:05:03 1590771 7839 KING 5000 10
U 28-11-07 00:05:03 1590778 7839 King 5000 10
I 28-11-07 00:05:03 1590771 7844 TURNER 1500 30
I 28-11-07 00:05:03 1590771 7876 ADAMS 1100 20
U 28-11-07 00:05:03 1590774 7876 ADAMS 1210 20
U 28-11-07 00:05:03 1590778 7876 Adams 1210 20
I 28-11-07 00:05:03 1590771 7900 JAMES 950 30
I 28-11-07 00:05:03 1590771 7902 FORD 3000 20
U 28-11-07 00:05:03 1590774 7902 FORD 3300 20
U 28-11-07 00:05:03 1590778 7902 Ford 3300 20
D 28-11-07 00:05:03 1590781 7902 Ford 3300 20
I 28-11-07 00:05:03 1590771 7934 MILLER 1300 10
U 28-11-07 00:05:03 1590778 7934 Miller 1300 10
33 rijen zijn geselecteerd.
Most of the time, maintenance is done by adding one or two columns, with some additional constraints. The adding of a column is not a problem for flashback archives, but adding constraints was also listed as being impossible. Let's verify that as well, by adding a primary key, unique key, foreign key and a check constraint:
rwijk@ORA11G> alter table my_emp add constraint my_pk primary key (empno)
2 /
Tabel is gewijzigd.
rwijk@ORA11G> alter table my_emp add constraint my_uk1 unique (ename)
2 /
Tabel is gewijzigd.
rwijk@ORA11G> alter table my_emp add constraint my_fk1 foreign key (deptno) references dept(deptno)
2 /
Tabel is gewijzigd.
rwijk@ORA11G> alter table my_emp add constraint ck1 check (sal>500)
2 /
Tabel is gewijzigd.
So, the documentation was wrong? Is it possible after all?
No.
Because the undo has been sneakily invalidated:
rwijk@ORA11G> select *
2 from my_emp_jn_v
3 order by empno
4 , jn_scn
5 /
J JN_USER JN_DATETIME JN_SCN EMPNO ENAME SAL DEPTNO
- ------------- ------------------- -------- ----- ---------- ----- ------
7369 Smith 880 20
7499 ALLEN 1600 30
7521 WARD 1250 30
7566 Jones 3273 20
7654 MARTIN 1250 30
7698 BLAKE 2850 30
7778 ROB 1800 40
7782 Clark 2450 10
7839 King 5000 10
7844 TURNER 1500 30
7876 Adams 1210 20
7900 JAMES 950 30
7934 Miller 1300 10
13 rijen zijn geselecteerd.
So if you make the mistake of adding a constraint, you lose all journaling information... This sure is something to be aware of. I think the adding of constraints to a table with a flashback archive should also raise an ORA-55610. Hopefully Oracle will fix this in a future release or patch set.
The only way I can think of to keep journaling information while still being able to do the necessary maintenance, is to create a journaling table in which you dump the information from the journaling view before invalidating the existing undo. Far from ideal, but doable as a workaround.
But what do we gain by implementing journaling this way?
The answer is simple: performance. No longer is it necessary to have database triggers. And because database triggers are evil, you win a lot, as can be seen in a little test in which I create 3 tables, one regular table, one table using a flashback archive and auditing, and one regular table with an after row trigger to populate a second journaling table:
rwijk@ORA11G> create flashback archive flashback_archive_10_years
2 tablespace my_tablespace
3 retention 10 year
4 /
Flashbackarchief is aangemaakt.
rwijk@ORA11G> create table my_emp_regular
2 ( empno number(6) not null
3 , ename varchar2(10)
4 , sal number(8,2)
5 , deptno number(2)
6 )
7 /
Tabel is aangemaakt.
rwijk@ORA11G> create table my_emp_flashback_archive
2 ( empno number(6) not null
3 , ename varchar2(10)
4 , sal number(8,2)
5 , deptno number(2)
6 ) flashback archive flashback_archive_10_years
7 /
Tabel is aangemaakt.
rwijk@ORA11G> audit insert,update,delete on my_emp_flashback_archive by access
2 /
Audit is geslaagd.
rwijk@ORA11G> create table my_emp_bad_old_journaling
2 ( empno number(6) not null
3 , ename varchar2(10)
4 , sal number(8,2)
5 , deptno number(2)
6 )
7 /
Tabel is aangemaakt.
rwijk@ORA11G> create table my_emp_bad_old_journaling_jn
2 ( jn_operation varchar2(1)
3 , jn_user varchar2(30)
4 , jn_datetime date
5 , jn_scn number
6 , empno number(6) not null
7 , ename varchar2(10)
8 , sal number(8,2)
9 , deptno number(2)
10 )
11 /
Tabel is aangemaakt.
rwijk@ORA11G> create trigger jn_ari
2 after insert on my_emp_bad_old_journaling
3 for each row
4 begin
5 insert into my_emp_bad_old_journaling_jn
6 ( jn_operation
7 , jn_user
8 , jn_datetime
9 , jn_scn
10 , empno
11 , ename
12 , sal
13 , deptno
14 )
15 values
16 ( 'I'
17 , user
18 , sysdate
19 , to_number(dbms_flashback.get_system_change_number)
20 , :new.empno
21 , :new.ename
22 , :new.sal
23 , :new.deptno
24 );
25 end;
26 /
Trigger is aangemaakt.
rwijk@ORA11G> set timing on
rwijk@ORA11G> insert into my_emp_regular
2 select level
3 , 'EMP' || to_char(level)
4 , 1000 + level
5 , 10 * (1 + mod(level,4))
6 from dual
7 connect by level <= 100000
8 /
100000 rijen zijn aangemaakt.
Verstreken: 00:00:00.48
rwijk@ORA11G> insert into my_emp_flashback_archive
2 select level
3 , 'EMP' || to_char(level)
4 , 1000 + level
5 , 10 * (1 + mod(level,4))
6 from dual
7 connect by level <= 100000
8 /
100000 rijen zijn aangemaakt.
Verstreken: 00:00:00.50
rwijk@ORA11G> insert into my_emp_bad_old_journaling
2 select level
3 , 'EMP' || to_char(level)
4 , 1000 + level
5 , 10 * (1 + mod(level,4))
6 from dual
7 connect by level <= 100000
8 /
100000 rijen zijn aangemaakt.
Verstreken: 00:00:13.39
As expected, almost no overhead using flashback data archives and therefore a lot faster than using the old fashioned way of journaling. But the maintenance thing described earlier is something to worry about in advance.
Further reading:
Yuri van Buren about Flashback Data Archive
Oracle 11g Total Recall - Flashback in the hands of Database Designers and Application Developers, at last. And: the end of Journalling Tables by Lucas Jellema
Friday, November 16, 2007
SQL Model Unconference
In an "Unconference" session at Oracle Open World, Laurent Schneider did a session about the SQL Model Clause. He posted his demonstration script on his site here.
If you have been reading my SQL Model Clause Tutorial, part one and SQL Model Clause Tutorial, part two, then I can recommend you to run this script as well and see some interesting queries. You only need the EMP and DEPT tables. A nice catch for me was to have a constant in the PARTITION BY clause.
If you have been reading my SQL Model Clause Tutorial, part one and SQL Model Clause Tutorial, part two, then I can recommend you to run this script as well and see some interesting queries. You only need the EMP and DEPT tables. A nice catch for me was to have a constant in the PARTITION BY clause.
Thursday, November 15, 2007
SAVE EXCEPTIONS
Since version 9 it is possible to do bulk DML using FORALL and use the SAVE EXCEPTIONS clause. It makes sure that all invalid rows - the exceptions - are saved into the sql%bulk_exceptions array, while all valid rows are still processed. This array stores a record for each invalid row, containing an ERROR_INDEX which is the iteration number during the FORALL statement and an ERROR_CODE which corresponds with the Oracle error code.
The error text is not stored. The documentation says:
Looks reasonable, but in our shop we validate lots of business rules with triggers. When a business rule is violated we do a RAISE_APPLICATION_ERROR(-20000,'APP-12345');
At client side (Webforms) the error message is looked up in the messages table and a friendly message is given. When coding a FORALL with SAVE EXCEPTIONS in such an environment, the error messages become useless as can be seen in the next example:
Note how the SQLERRM message doesn't return anything useful and that the name of the check constraint has disappeared. This is really annoying and can't be circumvented easily in 9i. For better error messages we would have to go back to row by row processing. And that means: very slow.
However, version 10gR2 introduced a feature called DML error logging. I remember reading about it more than two years ago here on Tom Kyte's blog. In this entry and in the documentation you only see examples using SQL, not PL/SQL examples using FORALL. But luckily this works as well:
And you do get to see the error messages (APP-12345 and APP-98765) and the name of the check constraint. Unfortunately, our shop still uses 9.2.0.7...
It is worth knowing that there are some restrictions when working with the DML error logging clause. From the 11g documentation (same as in 10gR2 documentation):
The error text is not stored. The documentation says:
The individual error messages, or any substitution arguments, are not saved, but the error message text can looked up using ERROR_CODE with SQLERRM ...
Looks reasonable, but in our shop we validate lots of business rules with triggers. When a business rule is violated we do a RAISE_APPLICATION_ERROR(-20000,'APP-12345');
At client side (Webforms) the error message is looked up in the messages table and a friendly message is given. When coding a FORALL with SAVE EXCEPTIONS in such an environment, the error messages become useless as can be seen in the next example:
rwijk@ORA11G> create table mytable
2 ( id number(4)
3 , name varchar2(30)
4 )
5 /
Tabel is aangemaakt.
rwijk@ORA11G> create trigger mytable_bri
2 before insert on mytable
3 for each row
4 begin
5 if :new.id = 2
6 then
7 raise_application_error(-20000,'APP-12345');
8 elsif :new.id = 9
9 then
10 raise_application_error(-20000,'APP-98765');
11 end if;
12 end;
13 /
Trigger is aangemaakt.
rwijk@ORA11G> alter table mytable add constraint mytable_ck1 check (id <> 6)
2 /
Tabel is gewijzigd.
rwijk@ORA11G> declare
2 e_forall_error exception;
3 pragma exception_init(e_forall_error,-24381)
4 ;
5 type t_numbers is table of mytable.id%type;
6 l_numbers t_numbers := t_numbers(1,2,3,4,5,6,7,8,9,10)
7 ;
8 begin
9 forall i in 1..l_numbers.count save exceptions
10 insert into mytable
11 ( id
12 , name
13 )
14 values
15 ( l_numbers(i)
16 , 'Name' || to_char(l_numbers(i))
17 )
18 ;
19 exception
20 when e_forall_error then
21 for i in 1..sql%bulk_exceptions.count
22 loop
23 dbms_output.put_line('SQLCODE: ' || sql%bulk_exceptions(i).error_code);
24 dbms_output.put_line('SQLERRM: ' || sqlerrm(-sql%bulk_exceptions(i).error_code));
25 dbms_output.new_line;
26 end loop;
27 end;
28 /
SQLCODE: 20000
SQLERRM: ORA-20000:
SQLCODE: 2290
SQLERRM: ORA-02290: CHECK-beperking (.) is geschonden.
SQLCODE: 20000
SQLERRM: ORA-20000:
PL/SQL-procedure is geslaagd.
rwijk@ORA11G> select id, name from mytable
2 /
ID NAME
---------- ------------------------------
1 Name1
3 Name3
4 Name4
5 Name5
7 Name7
8 Name8
10 Name10
7 rijen zijn geselecteerd.
Note how the SQLERRM message doesn't return anything useful and that the name of the check constraint has disappeared. This is really annoying and can't be circumvented easily in 9i. For better error messages we would have to go back to row by row processing. And that means: very slow.
However, version 10gR2 introduced a feature called DML error logging. I remember reading about it more than two years ago here on Tom Kyte's blog. In this entry and in the documentation you only see examples using SQL, not PL/SQL examples using FORALL. But luckily this works as well:
rwijk@ORA11G> rollback
2 /
Rollback is voltooid.
rwijk@ORA11G> exec dbms_errlog.create_error_log('mytable')
PL/SQL-procedure is geslaagd.
rwijk@ORA11G> declare
2 type t_numbers is table of mytable.id%type;
3 l_numbers t_numbers := t_numbers(1,2,3,4,5,6,7,8,9,10)
4 ;
5 begin
6 forall i in 1..l_numbers.count
7 insert into mytable
8 ( id
9 , name
10 )
11 values
12 ( l_numbers(i)
13 , 'Name' || to_char(l_numbers(i))
14 )
15 log errors reject limit unlimited
16 ;
17 for r in
18 ( select ora_err_number$
19 , ora_err_mesg$
20 from err$_mytable
21 )
22 loop
23 dbms_output.put_line('SQLCODE: ' || to_char(r.ora_err_number$));
24 dbms_output.put_line('SQLERRM: ' || r.ora_err_mesg$);
25 dbms_output.new_line;
26 end loop
27 ;
28 end;
29 /
SQLCODE: 20000
SQLERRM: ORA-20000: APP-12345
ORA-06512: in "RWIJK.MYTABLE_BRI", regel 4
ORA-04088: Fout bij uitvoering van trigger 'RWIJK.MYTABLE_BRI'
.
SQLCODE: 2290
SQLERRM: ORA-02290: CHECK-beperking (RWIJK.MYTABLE_CK1) is geschonden.
SQLCODE: 20000
SQLERRM: ORA-20000: APP-98765
ORA-06512: in "RWIJK.MYTABLE_BRI", regel 7
ORA-04088: Fout bij uitvoering van trigger 'RWIJK.MYTABLE_BRI'
.
PL/SQL-procedure is geslaagd.
rwijk@ORA11G> select id, name from mytable
2 /
ID NAME
---------- ------------------------------
1 Name1
3 Name3
4 Name4
5 Name5
7 Name7
8 Name8
10 Name10
7 rijen zijn geselecteerd.
And you do get to see the error messages (APP-12345 and APP-98765) and the name of the check constraint. Unfortunately, our shop still uses 9.2.0.7...
UPDATE
It is worth knowing that there are some restrictions when working with the DML error logging clause. From the 11g documentation (same as in 10gR2 documentation):
Restrictions on DML Error Logging
* The following conditions cause the statement to fail and roll back without invoking the error logging capability:
o Violated deferred constraints.
o Any direct-path INSERT or MERGE operation that raises a unique constraint or index violation.
o Any update operation UPDATE or MERGE that raises a unique constraint or index violation.
* You cannot track errors in the error logging table for LONG, LOB, or object type columns. However, the table that is the target of the DML operation can contain these types of columns.
o If you create or modify the corresponding error logging table so that it contains a column of an unsupported type, and if the name of that column corresponds to an unsupported column in the target DML table, then the DML statement fails at parse time.
o If the error logging table does not contain any unsupported column types, then all DML errors are logged until the reject limit of errors is reached. For rows on which errors occur, column values with corresponding columns in the error logging table are logged along with the control information.
Wednesday, November 14, 2007
Interval based row generation
Well, I didn't have a good name for this common SQL problem, so I decided to call it "interval based row generation". If someone out there knows a better or more common name for the problem, then please say so by leaving a comment.
An example says more than a thousand words, so here is an example of what I mean by the term:
Suppose I have this table:
And I want to generate rows for all dates between and including the start_date and the end_date. The query would have to generate 10 rows for group1, 1 row for group2 and 3 rows for group3. I know of six ways to do this type of query. There may be more of course. Here are the six alternatives:
I noticed something remarkable when running the same queries on my Oracle11g database. Alternatives 1, 2, 5 and 6 ran as expected, but 3 & 4 did not work anymore:
Alternative 3 uses a trick: line 8 prevented the ORA-01436 connect by loop in user data error in versions 9 and 10, but not anymore in 11.
Alternative 4 shows a new error message introduced in Oracle11g: ORA-30007. Apparently the use of the connect_by_root function has been restricted more. I don't know the reason why, although Oracle probably will have a good reason for the restriction.
In this recent OTN-thread Adrian Billington (of the excellent oracle-developer.net site) showed me alternative 5, which I had not seen before. He also did a small performance test showing that alternative 5 is slightly better than alternative 6, the model clause solution. I decided to investigate a little more by tracing/tkprof and by comparing latches with Tom Kyte's runstats_pkg.
Here are the results of a test with this table:
Tkprof on Oracle11g shows:
So, the clunky cartesian joins - alternatives 1 and 2 - seem to outperform the multiset and model clause solutions. These alternatives just generate all rows possibly needed and filter out the unneeded ones.
Next I compared alternative 2 with 5 using runstats-pkg:
So alternative 2 is faster wall clock wise, but is less scalabe than alternative 5.
Now comparing 5 and 6:
Conclusion is that Adrian's multiset solution is the close winner. The cartesian product solution will appeal more when the difference in intervals is small and the percentage of filtered rows is small. The model clause solution only wins in readability I guess, although some of you probably won't agree :-)
An example says more than a thousand words, so here is an example of what I mean by the term:
Suppose I have this table:
rwijk@ORA10GR2> create table mytable (code,start_date,end_date,value)
2 as
3 select 'group1', date '2007-01-01', date '2007-01-10', 11 from dual union all
4 select 'group2', date '2007-01-01', date '2007-01-01', 22 from dual union all
5 select 'group3', date '2007-01-01', date '2007-01-03', 33 from dual
6 /
Tabel is aangemaakt.
And I want to generate rows for all dates between and including the start_date and the end_date. The query would have to generate 10 rows for group1, 1 row for group2 and 3 rows for group3. I know of six ways to do this type of query. There may be more of course. Here are the six alternatives:
rwijk@ORA10GR2> select /* alternative 1: cartesian product with hard coded upper bound */
2 m.code
3 , m.start_date + l as day
4 , m.value
5 from mytable m
6 , (select level-1 l from dual connect by level <= 1000)
7 where l <= m.end_date - m.start_date
8 order by m.code
9 , day
10 /
CODE DAY VALUE
---------- ------------------- ----------
group1 01-01-2007 00:00:00 11
group1 02-01-2007 00:00:00 11
group1 03-01-2007 00:00:00 11
group1 04-01-2007 00:00:00 11
group1 05-01-2007 00:00:00 11
group1 06-01-2007 00:00:00 11
group1 07-01-2007 00:00:00 11
group1 08-01-2007 00:00:00 11
group1 09-01-2007 00:00:00 11
group1 10-01-2007 00:00:00 11
group2 01-01-2007 00:00:00 22
group3 01-01-2007 00:00:00 33
group3 02-01-2007 00:00:00 33
group3 03-01-2007 00:00:00 33
14 rijen zijn geselecteerd.
rwijk@ORA10GR2> select /* alternative 2: cartesian product with calculated upper bound */
2 m.code
3 , m.start_date + l as day
4 , m.value
5 from mytable m
6 , ( select level-1 l
7 from (select max(end_date-start_date+1) maxinterval from mytable)
8 connect by level <= maxinterval
9 )
10 where l <= m.end_date - m.start_date
11 order by m.code
12 , day
13 /
CODE DAY VALUE
---------- ------------------- ----------
group1 01-01-2007 00:00:00 11
group1 02-01-2007 00:00:00 11
group1 03-01-2007 00:00:00 11
group1 04-01-2007 00:00:00 11
group1 05-01-2007 00:00:00 11
group1 06-01-2007 00:00:00 11
group1 07-01-2007 00:00:00 11
group1 08-01-2007 00:00:00 11
group1 09-01-2007 00:00:00 11
group1 10-01-2007 00:00:00 11
group2 01-01-2007 00:00:00 22
group3 01-01-2007 00:00:00 33
group3 02-01-2007 00:00:00 33
group3 03-01-2007 00:00:00 33
14 rijen zijn geselecteerd.
rwijk@ORA10GR2> select /* alternative 3: hierarchical query with prior dbms_random.value */
2 m.code
3 , m.start_date + level - 1 as day
4 , m.value
5 from mytable m
6 connect by level <= m.end_date - m.start_date + 1
7 and prior m.code = m.code
8 and prior dbms_random.value is not null
9 order by m.code
10 , day
11 /
CODE DAY VALUE
---------- ------------------- ----------
group1 01-01-2007 00:00:00 11
group1 02-01-2007 00:00:00 11
group1 03-01-2007 00:00:00 11
group1 04-01-2007 00:00:00 11
group1 05-01-2007 00:00:00 11
group1 06-01-2007 00:00:00 11
group1 07-01-2007 00:00:00 11
group1 08-01-2007 00:00:00 11
group1 09-01-2007 00:00:00 11
group1 10-01-2007 00:00:00 11
group2 01-01-2007 00:00:00 22
group3 01-01-2007 00:00:00 33
group3 02-01-2007 00:00:00 33
group3 03-01-2007 00:00:00 33
14 rijen zijn geselecteerd.
rwijk@ORA10GR2> select /* alternative 4: hierarchical query with connect_by_root */
2 m.code
3 , m.start_date + level - 1 as day
4 , m.value
5 from mytable m
6 connect by connect_by_root m.code = m.code
7 and level <= m.end_date - m.start_date + 1
8 order by m.code
9 , day
10 /
CODE DAY VALUE
---------- ------------------- ----------
group1 01-01-2007 00:00:00 11
group1 02-01-2007 00:00:00 11
group1 03-01-2007 00:00:00 11
group1 04-01-2007 00:00:00 11
group1 05-01-2007 00:00:00 11
group1 06-01-2007 00:00:00 11
group1 07-01-2007 00:00:00 11
group1 08-01-2007 00:00:00 11
group1 09-01-2007 00:00:00 11
group1 10-01-2007 00:00:00 11
group2 01-01-2007 00:00:00 22
group3 01-01-2007 00:00:00 33
group3 02-01-2007 00:00:00 33
group3 03-01-2007 00:00:00 33
14 rijen zijn geselecteerd.
rwijk@ORA10GR2> select /* alternative 5: Adrian Billington's multiset */
2 m.code
3 , m.start_date + (row_number() over (partition by m.code order by null)-1) as day
4 , m.value
5 from mytable m
6 , table
7 ( cast
8 ( multiset(select null from dual connect by rownum <= (m.end_date-m.start_date)+1)
9 as sys.dbms_debug_vc2coll
10 )
11 ) t
12 order by m.code
13 , day
14 /
CODE DAY VALUE
---------- ------------------- ----------
group1 01-01-2007 00:00:00 11
group1 02-01-2007 00:00:00 11
group1 03-01-2007 00:00:00 11
group1 04-01-2007 00:00:00 11
group1 05-01-2007 00:00:00 11
group1 06-01-2007 00:00:00 11
group1 07-01-2007 00:00:00 11
group1 08-01-2007 00:00:00 11
group1 09-01-2007 00:00:00 11
group1 10-01-2007 00:00:00 11
group2 01-01-2007 00:00:00 22
group3 01-01-2007 00:00:00 33
group3 02-01-2007 00:00:00 33
group3 03-01-2007 00:00:00 33
14 rijen zijn geselecteerd.
rwijk@ORA10GR2> select /* alternative 6: model clause */
2 code
3 , day
4 , value
5 from mytable
6 model
7 partition by (code,value)
8 dimension by (0 i)
9 measures (start_date day, end_date)
10 rules
11 ( day[for i from 1 to end_date[0] - day[0] increment 1] = day[0] + cv(i)
12 )
13 order by code
14 , day
15 /
CODE DAY VALUE
---------- ------------------- ----------
group1 01-01-2007 00:00:00 11
group1 02-01-2007 00:00:00 11
group1 03-01-2007 00:00:00 11
group1 04-01-2007 00:00:00 11
group1 05-01-2007 00:00:00 11
group1 06-01-2007 00:00:00 11
group1 07-01-2007 00:00:00 11
group1 08-01-2007 00:00:00 11
group1 09-01-2007 00:00:00 11
group1 10-01-2007 00:00:00 11
group2 01-01-2007 00:00:00 22
group3 01-01-2007 00:00:00 33
group3 02-01-2007 00:00:00 33
group3 03-01-2007 00:00:00 33
14 rijen zijn geselecteerd.
I noticed something remarkable when running the same queries on my Oracle11g database. Alternatives 1, 2, 5 and 6 ran as expected, but 3 & 4 did not work anymore:
rwijk@ORA11G> select /* alternative 3: hierarchical query with prior dbms_random.value */
2 m.code
3 , m.start_date + level - 1 as day
4 , m.value
5 from mytable m
6 connect by level <= m.end_date - m.start_date + 1
7 and prior m.code = m.code
8 and prior dbms_random.value is not null
9 order by m.code
10 , day
11 /
from mytable m
*
FOUT in regel 5:
.ORA-01436: CONNECT BY-lus in gebruikersgegevens.
rwijk@ORA11G> select /* alternative 4: hierarchical query with connect_by_root */
2 m.code
3 , m.start_date + level - 1 as day
4 , m.value
5 from mytable m
6 connect by connect_by_root m.code = m.code
7 and level <= m.end_date - m.start_date + 1
8 order by m.code
9 , day
10 /
connect by connect_by_root m.code = m.code
*
FOUT in regel 6:
.ORA-30007: Operator CONNECT BY ROOT wordt niet ondersteund in de voorwaarde START WITH of CONNECT BY.
Alternative 3 uses a trick: line 8 prevented the ORA-01436 connect by loop in user data error in versions 9 and 10, but not anymore in 11.
Alternative 4 shows a new error message introduced in Oracle11g: ORA-30007. Apparently the use of the connect_by_root function has been restricted more. I don't know the reason why, although Oracle probably will have a good reason for the restriction.
In this recent OTN-thread Adrian Billington (of the excellent oracle-developer.net site) showed me alternative 5, which I had not seen before. He also did a small performance test showing that alternative 5 is slightly better than alternative 6, the model clause solution. I decided to investigate a little more by tracing/tkprof and by comparing latches with Tom Kyte's runstats_pkg.
Here are the results of a test with this table:
create table mytable (code,start_date,end_date,value)
as
select 'group' || to_char(level)
, trunc(sysdate)
, trunc(sysdate) + power(ceil(dbms_random.value(0,30)),2)
, 'value' || to_char(level)
from dual
connect by level <= 100
/
Tkprof on Oracle11g shows:
********************************************************************************
select /* alternative 1: cartesian product with hard coded upper bound */
m.code
, m.start_date + l as day
, m.value
from mytable m
, (select level-1 l from dual connect by level <= 1000)
where l <= m.end_date - m.start_date
order by m.code
, day
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 1 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2415 0.31 0.29 0 3000 0 36197
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2417 0.31 0.29 0 3001 0 36197
Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 81
Rows Row Source Operation
------- ---------------------------------------------------
36197 SORT ORDER BY (cr=3000 pr=0 pw=0 time=806 us cost=6 size=395 card=5)
36197 NESTED LOOPS (cr=3000 pr=0 pw=0 time=2220 us cost=5 size=395 card=5)
1000 VIEW (cr=0 pr=0 pw=0 time=67 us cost=2 size=13 card=1)
1000 CONNECT BY WITHOUT FILTERING (cr=0 pr=0 pw=0 time=27 us)
1 FAST DUAL (cr=0 pr=0 pw=0 time=0 us cost=2 size=0 card=1)
36197 TABLE ACCESS FULL MYTABLE (cr=3000 pr=0 pw=0 time=4259 us cost=3 size=330 card=5)
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 2415 0.00 0.00
SQL*Net message from client 2415 0.00 0.52
********************************************************************************
select /* alternative 2: cartesian product with calculated upper bound */
m.code
, m.start_date + l as day
, m.value
from mytable m
, ( select level-1 l
from (select max(end_date-start_date+1) maxinterval from mytable)
connect by level <= maxinterval
)
where l <= m.end_date - m.start_date
order by m.code
, day
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.01 0.00 0 3 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2415 0.26 0.28 0 2706 0 36197
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2417 0.28 0.29 0 2709 0 36197
Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 81
Rows Row Source Operation
------- ---------------------------------------------------
36197 SORT ORDER BY (cr=2706 pr=0 pw=0 time=797 us cost=7 size=395 card=5)
36197 NESTED LOOPS (cr=2706 pr=0 pw=0 time=2251 us cost=6 size=395 card=5)
901 VIEW (cr=3 pr=0 pw=0 time=66 us cost=3 size=13 card=1)
901 CONNECT BY WITHOUT FILTERING (cr=3 pr=0 pw=0 time=28 us)
1 VIEW (cr=3 pr=0 pw=0 time=0 us cost=3 size=13 card=1)
1 SORT AGGREGATE (cr=3 pr=0 pw=0 time=0 us)
100 TABLE ACCESS FULL MYTABLE (cr=3 pr=0 pw=0 time=2 us cost=3 size=1800 card=100)
36197 TABLE ACCESS FULL MYTABLE (cr=2703 pr=0 pw=0 time=4266 us cost=3 size=330 card=5)
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 2415 0.00 0.00
SQL*Net message from client 2415 0.00 0.52
********************************************************************************
select /* alternative 5: Adrian Billington's multiset */
m.code
, m.start_date + (row_number() over (partition by m.code order by null)-1) as day
, m.value
from mytable m
, table
( cast
( multiset(select null from dual connect by rownum <= (m.end_date-m.start_date)+1)
as sys.dbms_debug_vc2coll
)
) t
order by m.code
, day
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 1 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2415 0.35 0.36 0 3 0 36197
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2417 0.35 0.36 0 4 0 36197
Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 81
Rows Row Source Operation
------- ---------------------------------------------------
36197 SORT ORDER BY (cr=3 pr=0 pw=0 time=749 us cost=28444 size=53908800 card=816800)
36197 WINDOW SORT (cr=3 pr=0 pw=0 time=902 us cost=28444 size=53908800 card=816800)
36197 NESTED LOOPS (cr=3 pr=0 pw=0 time=4381 us cost=2726 size=53908800 card=816800)
100 TABLE ACCESS FULL MYTABLE (cr=3 pr=0 pw=0 time=4 us cost=3 size=6600 card=100)
36197 COLLECTION ITERATOR SUBQUERY FETCH (cr=0 pr=0 pw=0 time=4111 us)
36197 COUNT (cr=0 pr=0 pw=0 time=2600 us)
36197 CONNECT BY WITHOUT FILTERING (cr=0 pr=0 pw=0 time=962 us)
100 FAST DUAL (cr=0 pr=0 pw=0 time=0 us cost=2 size=0 card=1)
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 2415 0.00 0.00
SQL*Net message from client 2415 0.00 0.53
********************************************************************************
select /* alternative 6: model clause */
code
, day
, value
from mytable
model
partition by (code,value)
dimension by (0 i)
measures (start_date day, end_date)
rules
( day[for i from 1 to end_date[0] - day[0] increment 1] = day[0] + cv(i)
)
order by code
, day
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 1 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2415 0.46 0.51 0 3 0 36197
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2417 0.46 0.51 0 4 0 36197
Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 81
Rows Row Source Operation
------- ---------------------------------------------------
36197 SORT ORDER BY (cr=3 pr=0 pw=0 time=771 us cost=4 size=6600 card=100)
36197 SQL MODEL ORDERED (cr=3 pr=0 pw=0 time=697 us cost=4 size=6600 card=100)
100 TABLE ACCESS FULL MYTABLE (cr=3 pr=0 pw=0 time=3 us cost=3 size=6600 card=100)
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 2415 0.00 0.00
SQL*Net message from client 2415 0.00 0.53
********************************************************************************
So, the clunky cartesian joins - alternatives 1 and 2 - seem to outperform the multiset and model clause solutions. These alternatives just generate all rows possibly needed and filter out the unneeded ones.
Next I compared alternative 2 with 5 using runstats-pkg:
rwijk@ORA11G> exec runstats_pkg.rs_stop(50)
Run1 draaide in 78 hsecs
Run2 draaide in 86 hsecs
Run1 draaide in 90,7% van de tijd
Naam Run1 Run2 Verschil
LATCH.shared pool 156 222 66
LATCH.enqueues 80 14 -66
LATCH.enqueue hash chains 82 15 -67
STAT.undo change vector size 2,732 2,804 72
STAT.redo size 3,656 3,740 84
STAT.sorts (memory) 2 102 100
STAT.workarea executions - optimal 6 107 101
STAT.no work - consistent read gets 910 9 -901
STAT.table scan blocks gotten 902 1 -901
STAT.table scans (short tables) 902 1 -901
STAT.calls to get snapshot scn: kcmgss 1,811 9 -1,802
LATCH.cache buffers chains 2,856 163 -2,693
STAT.session logical reads 2,753 53 -2,700
STAT.consistent gets 2,727 26 -2,701
STAT.consistent gets from cache 2,727 26 -2,701
STAT.consistent gets from cache (fastpath) 2,720 18 -2,702
STAT.sorts (rows) 36,207 72,512 36,305
STAT.table scan rows gotten 90,200 100 -90,100
Run1 latches totaal versus run2 -- verschil en percentage
Run1 Run2 Verschil Pct
8,692 5,907 -2,785 147.15%
So alternative 2 is faster wall clock wise, but is less scalabe than alternative 5.
Now comparing 5 and 6:
rwijk@ORA11G> exec runstats_pkg.rs_stop(50)
Run1 draaide in 73 hsecs
Run2 draaide in 89 hsecs
Run1 draaide in 82,02% van de tijd
Naam Run1 Run2 Verschil
STAT.undo change vector size 2,868 2,804 -64
LATCH.SQL memory manager workarea list latch 74 9 -65
STAT.bytes received via SQL*Net from client 24,549 24,484 -65
LATCH.enqueues 2 76 74
LATCH.enqueue hash chains 3 78 75
STAT.redo size 3,876 3,784 -92
STAT.workarea executions - optimal 107 6 -101
STAT.sorts (memory) 102 1 -101
STAT.sorts (rows) 62,756 31,328 -31,428
Run1 latches totaal versus run2 -- verschil en percentage
Run1 Run2 Verschil Pct
4,700 4,811 111 97.69%
Conclusion is that Adrian's multiset solution is the close winner. The cartesian product solution will appeal more when the difference in intervals is small and the percentage of filtered rows is small. The model clause solution only wins in readability I guess, although some of you probably won't agree :-)