Tuesday, July 12, 2011

Partitioned outer join bug

I have experienced a strange bug when constructing an example showing how a partitioned outer join works. I used a setup which resembles a situation I have consulted about recently for timeseries data. The table can hold multiple timeseries whose data is coming from multiple sources. The data itself contains volumes that are reported for every 5 minutes. This table looks like this:

SQL> create table tradevolumes
2 ( timeseries_id number(4)
3 , tradedate date
4 , source_id number(4)
5 , volume number(4)
6 , constraint tradevolumes_pk primary key (timeseries_id,tradedate,source_id)
7 ) organization index
8 /

Table created.


An index organized table, which we fill with some data: 2 timeseries from 3 sources, and 108 five-minute-periods:

SQL> insert into tradevolumes
2 select 1 + mod(level-1,2)
3 , trunc(sysdate) + numtodsinterval(5*mod(trunc((level-1)/2),108),'minute')
4 , ceil(level/(2 * 108))
5 , ceil(dbms_random.value(0,1000))
6 from dual
7 connect by level <= 2 * 108 * 3
8 /

648 rows created.


A common problem for timeseries data is missing data. I'll simulate that by deleting several rows:

SQL> delete tradevolumes
2 where ( tradedate
3 between trunc(sysdate) + interval '35' minute
4 and trunc(sysdate) + interval '50' minute
5 or (timeseries_id = 1 and source_id = 3 and volume < 30)
6 or (timeseries_id = 2 and source_id = 2 and tradedate > trunc(sysdate) + interval '20' minute)
7 )
8 /

127 rows deleted.


And now gather some basic statistics for the table:

SQL> exec dbms_stats.gather_table_stats(user,'tradevolumes')

PL/SQL procedure successfully completed.


To get an idea of the data that is now in the table:

SQL> select *
2 from tradevolumes
3 order by timeseries_id
4 , tradedate
5 , source_id
6 /

TIMESERIES_ID TRADEDATE SOURCE_ID VOLUME
------------- ------------------- ---------- ----------
1 12-07-2011 00:00:00 1 400
1 12-07-2011 00:00:00 2 265
1 12-07-2011 00:00:00 3 307
1 12-07-2011 00:05:00 1 223
1 12-07-2011 00:05:00 2 975
1 12-07-2011 00:05:00 3 173
1 12-07-2011 00:10:00 1 794
1 12-07-2011 00:10:00 2 17
1 12-07-2011 00:10:00 3 468
1 12-07-2011 00:15:00 1 552
1 12-07-2011 00:15:00 2 979
1 12-07-2011 00:15:00 3 84
1 12-07-2011 00:20:00 1 73
1 12-07-2011 00:20:00 2 710
1 12-07-2011 00:20:00 3 54
1 12-07-2011 00:25:00 1 271
1 12-07-2011 00:25:00 2 643
1 12-07-2011 00:25:00 3 443
1 12-07-2011 00:30:00 1 408
1 12-07-2011 00:30:00 2 526
1 12-07-2011 00:30:00 3 725
1 12-07-2011 00:55:00 1 13
1 12-07-2011 00:55:00 2 943
1 12-07-2011 00:55:00 3 319
... skipped a lot of rows here ...
1 12-07-2011 08:50:00 1 239
1 12-07-2011 08:50:00 2 502
1 12-07-2011 08:50:00 3 297
1 12-07-2011 08:55:00 1 620
1 12-07-2011 08:55:00 2 531
1 12-07-2011 08:55:00 3 377
2 12-07-2011 00:00:00 1 778
2 12-07-2011 00:00:00 2 302
2 12-07-2011 00:00:00 3 959
2 12-07-2011 00:05:00 1 920
2 12-07-2011 00:05:00 2 827
2 12-07-2011 00:05:00 3 395
... skipped a lot of rows here ...
2 12-07-2011 08:45:00 1 495
2 12-07-2011 08:45:00 3 519
2 12-07-2011 08:50:00 1 556
2 12-07-2011 08:50:00 3 193
2 12-07-2011 08:55:00 1 368
2 12-07-2011 08:55:00 3 18

521 rows selected.


To show how a partitioned outer join works, my plan is to add all missing key combinations to the result set for the first hour. So the query only uses the times between 0:00 to 0:55 and we'll ignore the rows between 1:00 and 8:55. This is the data for the first hour:

SQL> select *
2 from tradevolumes
3 where tradedate between trunc(sysdate) and trunc(sysdate) + interval '55' minute
4 order by timeseries_id
5 , tradedate
6 , source_id
7 /

TIMESERIES_ID TRADEDATE SOURCE_ID VOLUME
------------- ------------------- ---------- ----------
1 12-07-2011 00:00:00 1 400
1 12-07-2011 00:00:00 2 265
1 12-07-2011 00:00:00 3 307
1 12-07-2011 00:05:00 1 223
1 12-07-2011 00:05:00 2 975
1 12-07-2011 00:05:00 3 173
1 12-07-2011 00:10:00 1 794
1 12-07-2011 00:10:00 2 17
1 12-07-2011 00:10:00 3 468
1 12-07-2011 00:15:00 1 552
1 12-07-2011 00:15:00 2 979
1 12-07-2011 00:15:00 3 84
1 12-07-2011 00:20:00 1 73
1 12-07-2011 00:20:00 2 710
1 12-07-2011 00:20:00 3 54
1 12-07-2011 00:25:00 1 271
1 12-07-2011 00:25:00 2 643
1 12-07-2011 00:25:00 3 443
1 12-07-2011 00:30:00 1 408
1 12-07-2011 00:30:00 2 526
1 12-07-2011 00:30:00 3 725
1 12-07-2011 00:55:00 1 13
1 12-07-2011 00:55:00 2 943
1 12-07-2011 00:55:00 3 319
2 12-07-2011 00:00:00 1 778
2 12-07-2011 00:00:00 2 302
2 12-07-2011 00:00:00 3 959
2 12-07-2011 00:05:00 1 920
2 12-07-2011 00:05:00 2 827
2 12-07-2011 00:05:00 3 395
2 12-07-2011 00:10:00 1 534
2 12-07-2011 00:10:00 2 156
2 12-07-2011 00:10:00 3 2
2 12-07-2011 00:15:00 1 642
2 12-07-2011 00:15:00 2 278
2 12-07-2011 00:15:00 3 76
2 12-07-2011 00:20:00 1 472
2 12-07-2011 00:20:00 2 210
2 12-07-2011 00:20:00 3 549
2 12-07-2011 00:25:00 1 349
2 12-07-2011 00:25:00 3 332
2 12-07-2011 00:30:00 1 371
2 12-07-2011 00:30:00 3 396
2 12-07-2011 00:55:00 1 408
2 12-07-2011 00:55:00 3 818

45 rows selected.


The 12 five-minute-periods should be present for the 2 timeseries and 3 sources, so we need 72 rows. Now watch the bug with a standard partitioned outer join:

SQL> with all_tradedates as
2 ( select trunc(sysdate) + numtodsinterval(5*(level-1),'minute') tradedate
3 from dual
4 connect by level <= 12
5 )
6 select v.timeseries_id
7 , a.tradedate
8 , v.source_id
9 , v.volume
10 from all_tradedates a
11 left outer join tradevolumes v
12 partition by (v.timeseries_id,v.source_id)
13 on (a.tradedate = v.tradedate)
14 order by v.timeseries_id
15 , a.tradedate
16 , v.source_id
17 /

TIMESERIES_ID TRADEDATE SOURCE_ID VOLUME
------------- ------------------- ---------- ----------
1 12-07-2011 00:00:00 1 400
1 12-07-2011 00:00:00 2 265
1 12-07-2011 00:00:00 3 307
1 12-07-2011 00:05:00 1 400
1 12-07-2011 00:05:00 2 265
1 12-07-2011 00:05:00 3 307
1 12-07-2011 00:10:00 1 400
1 12-07-2011 00:10:00 2 265
1 12-07-2011 00:10:00 3 307
1 12-07-2011 00:15:00 1 400
1 12-07-2011 00:15:00 2 265
1 12-07-2011 00:15:00 3 307
1 12-07-2011 00:20:00 1 400
1 12-07-2011 00:20:00 2 265
1 12-07-2011 00:20:00 3 307
1 12-07-2011 00:25:00 1 400
1 12-07-2011 00:25:00 2 265
1 12-07-2011 00:25:00 3 307
1 12-07-2011 00:30:00 1 400
1 12-07-2011 00:30:00 2 265
1 12-07-2011 00:30:00 3 307
1 12-07-2011 00:35:00 1 400
1 12-07-2011 00:35:00 2 265
1 12-07-2011 00:35:00 3 307
1 12-07-2011 00:40:00 1 400
1 12-07-2011 00:40:00 2 265
1 12-07-2011 00:40:00 3 307
1 12-07-2011 00:45:00 1 400
1 12-07-2011 00:45:00 2 265
1 12-07-2011 00:45:00 3 307
1 12-07-2011 00:50:00 1 400
1 12-07-2011 00:50:00 2 265
1 12-07-2011 00:50:00 3 307
1 12-07-2011 00:55:00 1 400
1 12-07-2011 00:55:00 2 265
1 12-07-2011 00:55:00 3 307
2 12-07-2011 00:00:00 1 778
2 12-07-2011 00:00:00 2 302
2 12-07-2011 00:00:00 3 959
2 12-07-2011 00:05:00 1 778
2 12-07-2011 00:05:00 2 302
2 12-07-2011 00:05:00 3 959
2 12-07-2011 00:10:00 1 778
2 12-07-2011 00:10:00 2 302
2 12-07-2011 00:10:00 3 959
2 12-07-2011 00:15:00 1 778
2 12-07-2011 00:15:00 2 302
2 12-07-2011 00:15:00 3 959
2 12-07-2011 00:20:00 1 778
2 12-07-2011 00:20:00 2 302
2 12-07-2011 00:20:00 3 959
2 12-07-2011 00:25:00 1 778
2 12-07-2011 00:25:00 2 302
2 12-07-2011 00:25:00 3 959
2 12-07-2011 00:30:00 1 778
2 12-07-2011 00:30:00 2 302
2 12-07-2011 00:30:00 3 959
2 12-07-2011 00:35:00 1 778
2 12-07-2011 00:35:00 2 302
2 12-07-2011 00:35:00 3 959
2 12-07-2011 00:40:00 1 778
2 12-07-2011 00:40:00 2 302
2 12-07-2011 00:40:00 3 959
2 12-07-2011 00:45:00 1 778
2 12-07-2011 00:45:00 2 302
2 12-07-2011 00:45:00 3 959
2 12-07-2011 00:50:00 1 778
2 12-07-2011 00:50:00 2 302
2 12-07-2011 00:50:00 3 959
2 12-07-2011 00:55:00 1 778
2 12-07-2011 00:55:00 2 302
2 12-07-2011 00:55:00 3 959

72 rows selected.


The query returns 72 rows as expected, but look at the volume column. When you make up new rows, those new rows should have a null value. And there's more: the rows that existed and weren't made up, now have a volume equaling the volume of "00:00:00" and not their own value...

You might ask: why did you create 108 rows? That's because I've found that when you use 107 rows or less, the bug doesn't appear. With 108 rows or more it creeps up. Here is the same sequence with a table containing 107 rows:

SQL> drop table tradevolumes purge
2 /

Table dropped.

SQL> create table tradevolumes
2 ( timeseries_id number(4)
3 , tradedate date
4 , source_id number(4)
5 , volume number(4)
6 , constraint tradevolumes_pk primary key (timeseries_id,tradedate,source_id)
7 ) organization index
8 /

Table created.

SQL> insert into tradevolumes
2 select 1 + mod(level-1,2)
3 , trunc(sysdate) + numtodsinterval(5*mod(trunc((level-1)/2),107),'minute')
4 , ceil(level/(2 * 107))
5 , ceil(dbms_random.value(0,1000))
6 from dual
7 connect by level <= 2 * 107 * 3
8 /

642 rows created.

SQL> delete tradevolumes
2 where ( tradedate
3 between trunc(sysdate) + interval '35' minute
4 and trunc(sysdate) + interval '50' minute
5 or (timeseries_id = 1 and source_id = 3 and volume < 30)
6 or (timeseries_id = 2 and source_id = 2 and tradedate > trunc(sysdate) + interval '20' minute)
7 )
8 /

128 rows deleted.

SQL> exec dbms_stats.gather_table_stats(user,'tradevolumes')

PL/SQL procedure successfully completed.

SQL> select *
2 from tradevolumes
3 order by timeseries_id
4 , tradedate
5 , source_id
6 /

TIMESERIES_ID TRADEDATE SOURCE_ID VOLUME
------------- ------------------- ---------- ----------
1 12-07-2011 00:00:00 1 159
1 12-07-2011 00:00:00 2 333
1 12-07-2011 00:00:00 3 696
1 12-07-2011 00:05:00 1 674
1 12-07-2011 00:05:00 2 932
... skipped a lot of rows here ...
2 12-07-2011 08:45:00 1 136
2 12-07-2011 08:45:00 3 618
2 12-07-2011 08:50:00 1 705
2 12-07-2011 08:50:00 3 106

514 rows selected.

SQL> with all_tradedates as
2 ( select trunc(sysdate) + numtodsinterval(5*(level-1),'minute') tradedate
3 from dual
4 connect by level <= 12
5 )
6 select v.timeseries_id
7 , a.tradedate
8 , v.source_id
9 , v.volume
10 from all_tradedates a
11 left outer join tradevolumes v
12 partition by (v.timeseries_id,v.source_id)
13 on (a.tradedate = v.tradedate)
14 order by v.timeseries_id
15 , a.tradedate
16 , v.source_id
17 /

TIMESERIES_ID TRADEDATE SOURCE_ID VOLUME
------------- ------------------- ---------- ----------
1 12-07-2011 00:00:00 1 159
1 12-07-2011 00:00:00 2 333
1 12-07-2011 00:00:00 3 696
1 12-07-2011 00:05:00 1 674
1 12-07-2011 00:05:00 2 932
1 12-07-2011 00:05:00 3
1 12-07-2011 00:10:00 1 315
1 12-07-2011 00:10:00 2 872
1 12-07-2011 00:10:00 3 778
1 12-07-2011 00:15:00 1 398
1 12-07-2011 00:15:00 2 272
1 12-07-2011 00:15:00 3 239
1 12-07-2011 00:20:00 1 546
1 12-07-2011 00:20:00 2 863
1 12-07-2011 00:20:00 3 839
1 12-07-2011 00:25:00 1 558
1 12-07-2011 00:25:00 2 576
1 12-07-2011 00:25:00 3 296
1 12-07-2011 00:30:00 1 767
1 12-07-2011 00:30:00 2 957
1 12-07-2011 00:30:00 3 935
1 12-07-2011 00:35:00 1
1 12-07-2011 00:35:00 2
1 12-07-2011 00:35:00 3
1 12-07-2011 00:40:00 1
1 12-07-2011 00:40:00 2
1 12-07-2011 00:40:00 3
1 12-07-2011 00:45:00 1
1 12-07-2011 00:45:00 2
1 12-07-2011 00:45:00 3
1 12-07-2011 00:50:00 1
1 12-07-2011 00:50:00 2
1 12-07-2011 00:50:00 3
1 12-07-2011 00:55:00 1 398
1 12-07-2011 00:55:00 2 110
1 12-07-2011 00:55:00 3 629
2 12-07-2011 00:00:00 1 615
2 12-07-2011 00:00:00 2 924
2 12-07-2011 00:00:00 3 526
2 12-07-2011 00:05:00 1 147
2 12-07-2011 00:05:00 2 589
2 12-07-2011 00:05:00 3 779
2 12-07-2011 00:10:00 1 554
2 12-07-2011 00:10:00 2 108
2 12-07-2011 00:10:00 3 9
2 12-07-2011 00:15:00 1 601
2 12-07-2011 00:15:00 2 111
2 12-07-2011 00:15:00 3 526
2 12-07-2011 00:20:00 1 418
2 12-07-2011 00:20:00 2 818
2 12-07-2011 00:20:00 3 578
2 12-07-2011 00:25:00 1 824
2 12-07-2011 00:25:00 2
2 12-07-2011 00:25:00 3 112
2 12-07-2011 00:30:00 1 649
2 12-07-2011 00:30:00 2
2 12-07-2011 00:30:00 3 689
2 12-07-2011 00:35:00 1
2 12-07-2011 00:35:00 2
2 12-07-2011 00:35:00 3
2 12-07-2011 00:40:00 1
2 12-07-2011 00:40:00 2
2 12-07-2011 00:40:00 3
2 12-07-2011 00:45:00 1
2 12-07-2011 00:45:00 2
2 12-07-2011 00:45:00 3
2 12-07-2011 00:50:00 1
2 12-07-2011 00:50:00 2
2 12-07-2011 00:50:00 3
2 12-07-2011 00:55:00 1 651
2 12-07-2011 00:55:00 2
2 12-07-2011 00:55:00 3 641

72 rows selected.


And here the results are good. And when I add the rows for 08:55:00 afterwards:

SQL> insert into tradevolumes
2 select timeseries_id
3 , tradedate + interval '5' minute
4 , source_id
5 , 1000
6 from tradevolumes
7 where tradedate = (select max(tradedate) from tradevolumes)
8 /

4 rows created.

SQL> commit
2 /

Commit complete.

SQL> exec dbms_stats.gather_table_stats(user,'tradevolumes')

PL/SQL procedure successfully completed.

SQL> with all_tradedates as
2 ( select trunc(sysdate) + numtodsinterval(5*(level-1),'minute') tradedate
3 from dual
4 connect by level <= 12
5 )
6 select v.timeseries_id
7 , a.tradedate
8 , v.source_id
9 , v.volume
10 from all_tradedates a
11 left outer join tradevolumes v
12 partition by (v.timeseries_id,v.source_id)
13 on (a.tradedate = v.tradedate)
14 order by v.timeseries_id
15 , a.tradedate
16 , v.source_id
17 /

TIMESERIES_ID TRADEDATE SOURCE_ID VOLUME
------------- ------------------- ---------- ----------
1 12-07-2011 00:00:00 1 159
1 12-07-2011 00:00:00 2 333
1 12-07-2011 00:00:00 3 696
1 12-07-2011 00:05:00 1 674
1 12-07-2011 00:05:00 2 932
1 12-07-2011 00:05:00 3
1 12-07-2011 00:10:00 1 315
1 12-07-2011 00:10:00 2 872
1 12-07-2011 00:10:00 3 778
1 12-07-2011 00:15:00 1 398
1 12-07-2011 00:15:00 2 272
1 12-07-2011 00:15:00 3 239
1 12-07-2011 00:20:00 1 546
1 12-07-2011 00:20:00 2 863
1 12-07-2011 00:20:00 3 839
1 12-07-2011 00:25:00 1 558
1 12-07-2011 00:25:00 2 576
1 12-07-2011 00:25:00 3 296
1 12-07-2011 00:30:00 1 767
1 12-07-2011 00:30:00 2 957
1 12-07-2011 00:30:00 3 935
1 12-07-2011 00:35:00 1
1 12-07-2011 00:35:00 2
1 12-07-2011 00:35:00 3
1 12-07-2011 00:40:00 1
1 12-07-2011 00:40:00 2
1 12-07-2011 00:40:00 3
1 12-07-2011 00:45:00 1
1 12-07-2011 00:45:00 2
1 12-07-2011 00:45:00 3
1 12-07-2011 00:50:00 1
1 12-07-2011 00:50:00 2
1 12-07-2011 00:50:00 3
1 12-07-2011 00:55:00 1 398
1 12-07-2011 00:55:00 2 110
1 12-07-2011 00:55:00 3 629
2 12-07-2011 00:00:00 1 615
2 12-07-2011 00:00:00 2 924
2 12-07-2011 00:00:00 3 526
2 12-07-2011 00:05:00 1 147
2 12-07-2011 00:05:00 2 589
2 12-07-2011 00:05:00 3 779
2 12-07-2011 00:10:00 1 554
2 12-07-2011 00:10:00 2 108
2 12-07-2011 00:10:00 3 9
2 12-07-2011 00:15:00 1 601
2 12-07-2011 00:15:00 2 111
2 12-07-2011 00:15:00 3 526
2 12-07-2011 00:20:00 1 418
2 12-07-2011 00:20:00 2 818
2 12-07-2011 00:20:00 3 578
2 12-07-2011 00:25:00 1 824
2 12-07-2011 00:25:00 2
2 12-07-2011 00:25:00 3 112
2 12-07-2011 00:30:00 1 649
2 12-07-2011 00:30:00 2
2 12-07-2011 00:30:00 3 689
2 12-07-2011 00:35:00 1
2 12-07-2011 00:35:00 2
2 12-07-2011 00:35:00 3
2 12-07-2011 00:40:00 1
2 12-07-2011 00:40:00 2
2 12-07-2011 00:40:00 3
2 12-07-2011 00:45:00 1
2 12-07-2011 00:45:00 2
2 12-07-2011 00:45:00 3
2 12-07-2011 00:50:00 1
2 12-07-2011 00:50:00 2
2 12-07-2011 00:50:00 3
2 12-07-2011 00:55:00 1 651
2 12-07-2011 00:55:00 2
2 12-07-2011 00:55:00 3 641

72 rows selected.


And now it works!

And using 108 rows with a regular heap table also isn't a problem:

SQL> drop table tradevolumes purge
2 /

Table dropped.

SQL> create table tradevolumes
2 ( timeseries_id number(4)
3 , tradedate date
4 , source_id number(4)
5 , volume number(4)
6 , constraint tradevolumes_pk primary key (timeseries_id,tradedate,source_id)
7 )
8 /

Table created.

SQL> insert into tradevolumes
2 select 1 + mod(level-1,2)
3 , trunc(sysdate) + numtodsinterval(5*mod(trunc((level-1)/2),108),'minute')
4 , ceil(level/(2 * 108))
5 , ceil(dbms_random.value(0,1000))
6 from dual
7 connect by level <= 2 * 108 * 3
8 /

648 rows created.

SQL> delete tradevolumes
2 where ( tradedate
3 between trunc(sysdate) + interval '35' minute
4 and trunc(sysdate) + interval '50' minute
5 or (timeseries_id = 1 and source_id = 3 and volume < 30)
6 or (timeseries_id = 2 and source_id = 2 and tradedate > trunc(sysdate) + interval '20' minute)
7 )
8 /

126 rows deleted.

SQL> exec dbms_stats.gather_table_stats(user,'tradevolumes')

PL/SQL procedure successfully completed.

SQL> select *
2 from tradevolumes
3 order by timeseries_id
4 , tradedate
5 , source_id
6 /

TIMESERIES_ID TRADEDATE SOURCE_ID VOLUME
------------- ------------------- ---------- ----------
1 12-07-2011 00:00:00 1 200
1 12-07-2011 00:00:00 2 157
1 12-07-2011 00:00:00 3 950
1 12-07-2011 00:05:00 1 707
1 12-07-2011 00:05:00 2 932
1 12-07-2011 00:05:00 3 258
... skipped a lot of rows here ...
2 12-07-2011 08:50:00 1 61
2 12-07-2011 08:50:00 3 974
2 12-07-2011 08:55:00 1 692
2 12-07-2011 08:55:00 3 714

522 rows selected.

SQL> with all_tradedates as
2 ( select trunc(sysdate) + numtodsinterval(5*(level-1),'minute') tradedate
3 from dual
4 connect by level <= 12
5 )
6 select v.timeseries_id
7 , a.tradedate
8 , v.source_id
9 , v.volume
10 from all_tradedates a
11 left outer join tradevolumes v
12 partition by (v.timeseries_id,v.source_id)
13 on (a.tradedate = v.tradedate)
14 order by v.timeseries_id
15 , a.tradedate
16 , v.source_id
17 /

TIMESERIES_ID TRADEDATE SOURCE_ID VOLUME
------------- ------------------- ---------- ----------
1 12-07-2011 00:00:00 1 200
1 12-07-2011 00:00:00 2 157
1 12-07-2011 00:00:00 3 950
1 12-07-2011 00:05:00 1 707
1 12-07-2011 00:05:00 2 932
1 12-07-2011 00:05:00 3 258
1 12-07-2011 00:10:00 1 586
1 12-07-2011 00:10:00 2 193
1 12-07-2011 00:10:00 3 753
1 12-07-2011 00:15:00 1 287
1 12-07-2011 00:15:00 2 420
1 12-07-2011 00:15:00 3 113
1 12-07-2011 00:20:00 1 971
1 12-07-2011 00:20:00 2 93
1 12-07-2011 00:20:00 3 967
1 12-07-2011 00:25:00 1 685
1 12-07-2011 00:25:00 2 63
1 12-07-2011 00:25:00 3 832
1 12-07-2011 00:30:00 1 471
1 12-07-2011 00:30:00 2 764
1 12-07-2011 00:30:00 3 277
1 12-07-2011 00:35:00 1
1 12-07-2011 00:35:00 2
1 12-07-2011 00:35:00 3
1 12-07-2011 00:40:00 1
1 12-07-2011 00:40:00 2
1 12-07-2011 00:40:00 3
1 12-07-2011 00:45:00 1
1 12-07-2011 00:45:00 2
1 12-07-2011 00:45:00 3
1 12-07-2011 00:50:00 1
1 12-07-2011 00:50:00 2
1 12-07-2011 00:50:00 3
1 12-07-2011 00:55:00 1 332
1 12-07-2011 00:55:00 2 909
1 12-07-2011 00:55:00 3 873
2 12-07-2011 00:00:00 1 865
2 12-07-2011 00:00:00 2 982
2 12-07-2011 00:00:00 3 290
2 12-07-2011 00:05:00 1 936
2 12-07-2011 00:05:00 2 651
2 12-07-2011 00:05:00 3 141
2 12-07-2011 00:10:00 1 843
2 12-07-2011 00:10:00 2 818
2 12-07-2011 00:10:00 3 218
2 12-07-2011 00:15:00 1 85
2 12-07-2011 00:15:00 2 530
2 12-07-2011 00:15:00 3 650
2 12-07-2011 00:20:00 1 475
2 12-07-2011 00:20:00 2 988
2 12-07-2011 00:20:00 3 771
2 12-07-2011 00:25:00 1 422
2 12-07-2011 00:25:00 2
2 12-07-2011 00:25:00 3 48
2 12-07-2011 00:30:00 1 88
2 12-07-2011 00:30:00 2
2 12-07-2011 00:30:00 3 991
2 12-07-2011 00:35:00 1
2 12-07-2011 00:35:00 2
2 12-07-2011 00:35:00 3
2 12-07-2011 00:40:00 1
2 12-07-2011 00:40:00 2
2 12-07-2011 00:40:00 3
2 12-07-2011 00:45:00 1
2 12-07-2011 00:45:00 2
2 12-07-2011 00:45:00 3
2 12-07-2011 00:50:00 1
2 12-07-2011 00:50:00 2
2 12-07-2011 00:50:00 3
2 12-07-2011 00:55:00 1 793
2 12-07-2011 00:55:00 2
2 12-07-2011 00:55:00 3 601

72 rows selected.


The above was all executed in version 11.2.0.2 and I've also tried it on 10.2.0.4 and I witnessed exactly the same phenomenon there. The only difference is that the boundary value is not 107/108 but 80/81.

I did some more basic investigations regarding space occupation and statistics, but I haven't found a clue so far. I'm wondering if I'm missing something obvious. So, if you have some more information about affected versions or the cause, I'd like to know.

4 comments:

  1. Thanks Greg.
    The bug number isn't visible for me on MOS though.

    ReplyDelete
  2. Hello Rob,

    Today I found your blog web page below, but for some reason I am not able to post a message to that thread ( some Google account problems, they also have bugs ...)

    Since I encunter the same problem, I just wanted to share my conclusions with you.

    I would just add that today I encountered the partitioned outer join bug with a much simpler case than yours, a simple classic case of 3 tables: customers, products, sales, stored as regular heap tables, with the sales table referencing the 2 dimension tables and with only 4 customer rows, 3 product rows and 9 sales rows inserted.


    The bug exists in 11.1.0.7.0, but everything works correctly in 10.2.0.3.0 and probably also in 10.2.0.4.0.

    I don't have 11gR2 to test it out.

    What I found is that if I perform the following in 11.1.0.7.0:

    alter session set optimizer_features_enable='10.2.0.4';

    then it starts behaving correctly !

    I cannot say whether the bug was introduced in 11g when applying one of the patches, or it existed from the beginning.

    I can hardly believe that Oracle did release a new version with such a bug ...

    Also, a few months ago I also used it in 11g with some other case, and then it seemed to be working correctly.

    Now I am totally helpless, do you know whether Oracle has supplied a possible correction for it,
    maybe setting an event, a.s.o.

    And, if there are bugs in 11gR2 also ... then the future also looks bad ...

    Thanks a lot & Best Regards,
    Iudith Mentzel.

    ReplyDelete
  3. Hello Rob again,

    As a completion to the below foundings:

    Using "ALTER SESSION set optimizer_features_enable = " with any of the following settings:

    9.2.0.8
    10.2.0.3
    10.2.0.4
    10.2.0.5
    11.1.0.6

    works ok in both Oracle 11.1.0.7.0 and 11.2.0.1.0.

    Using any of the following:

    11.1.0.7
    11.2.0.1
    11.2.0.1.1

    DOES NOT work in Oracle 11.2.0.1.0 ( 11.1.0.7 also does not work in Oracle 11.1.0.7.0 ).


    When I say "does not work", the behavior I encounter is that instead of the PARTITIONED OUTER JOIN it simply performs an INNER JOIN of the 2 tables.


    So, it looks that the bug was definitely introduced starting from 11.1.0.7 .

    Since the above settings help to avoid the bug, it looks that it is a side-effect of some internal action/decision of the optimizer, therefore probably not very simple to correct ...


    Is this feature really so "underused" that only few people discovered this bug ?

    I still can hardly believe it, because I find this feature very useful and elegant, though I am not exactly specialized in data warehouse programming .


    Sorry again that I am not able to post directly to your blog page, my Google account does not work
    and the page does not accept "Anonymous" postings ...


    But I will be definitely following your blog topic.

    Thanks a lot & Best Regards,
    Iudith Mentzel

    ReplyDelete