Sunday, September 25, 2011

EXISTS

Most of us know the SQL keyword EXISTS as the condition with a subquery you use in a WHERE clause. But if you look at the documentation of EXISTS, you'll see that it says nothing about just using it in a WHERE clause. It's just a sort of function that accepts a query as input and returns "TRUE if a subquery returns at least one row". The SQL language doesn't know about booleans, but it calls these "conditions". And according to another part of the documentation about SQL conditions:

You can use a condition in the WHERE clause of these statements:

DELETE
SELECT
UPDATE

You can use a condition in any of these clauses of the SELECT statement:

WHERE
START WITH
CONNECT BY
HAVING

So you might be tempted to believe it is not possible to use EXISTS in the SELECT clause. However, in a CASE expression, more specifically the searched case expression, it expects a "condition" after the WHEN, as you can see in the syntax wire diagram. Meaning we can put an EXISTS condition after the WHEN.

An example of this came up recently when a colleague asked me for help with his conversion project with a query. His setup resembles this test case:

SQL> create table t1 (id)
2 as
3 select level
4 from dual
5 connect by level <= 4
6 /

Table created.

SQL> alter table t1 add constraint t1_pk primary key (id)
2 /

Table altered.

SQL> exec dbms_stats.gather_table_stats(user,'t1',cascade=>true)

PL/SQL procedure successfully completed.

SQL> create table t2 (id, t1_id, status, filler)
2 as
3 select 1, 1, 'w', lpad('*',1000,'*') from dual union all
4 select 2, 2, 'x', lpad('*',1000,'*') from dual union all
5 select 3, 2, 'y', lpad('*',1000,'*') from dual union all
6 select 4, 3, 'w', lpad('*',1000,'*') from dual union all
7 select 5, 3, 'x', lpad('*',1000,'*') from dual union all
8 select 6, 3, 'z', lpad('*',1000,'*') from dual union all
9 select 7, 4, 'w', lpad('*',1000,'*') from dual union all
10 select 8, 4, 'z', lpad('*',1000,'*') from dual
11 /

Table created.

SQL> alter table t2 add constraint t2_pk primary key (id)
2 /

Table altered.

SQL> alter table t2 add constraint t2_t1_fk foreign key (t1_id) references t1(id)
2 /

Table altered.

SQL> create index t2_i on t2 (status,t1_id)
2 /

Index created.

SQL> exec dbms_stats.gather_table_stats(user,'t2',cascade=>true)

PL/SQL procedure successfully completed.

Two tables with a master detail relationship. The question for me was to come up with a query that selects all rows of t1 and if there is at least one row in t2 with status x or y, then print a 'yes' in an extra column, and 'no' otherwise.

The first query that I came up with, was to use a simple outer join and remove the duplicates with a distinct.

SQL> select /*+ gather_plan_statistics */
2 distinct t1.id
3 , nvl2(t2.id,'yes','no') exists_t2
4 from t1
5 left outer join t2
6 on ( t1.id = t2.t1_id
7 and t2.status in ('x','y')
8 )
9 order by t1.id
10 /

ID EXISTS_T2
----- ----------
1 no
2 yes
3 yes
4 no

4 rows selected.

And it is done in 4 consistent gets using this plan:

SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'))
2 /

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 2mtksgp0pzx59, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ distinct t1.id ,
nvl2(t2.id,'yes','no') exists_t2 from t1 left outer join t2
on ( t1.id = t2.t1_id and t2.status in ('x','y')
) order by t1.id

Plan hash value: 1021492722

------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 4 |00:00:00.01 | 4 | | | |
| 1 | SORT UNIQUE | | 1 | 4 | 4 |00:00:00.01 | 4 | 2048 | 2048 | 2048 (0)|
| 2 | MERGE JOIN OUTER | | 1 | 4 | 5 |00:00:00.01 | 4 | | | |
| 3 | INDEX FULL SCAN | T1_PK | 1 | 4 | 4 |00:00:00.01 | 1 | | | |
|* 4 | SORT JOIN | | 4 | 4 | 3 |00:00:00.01 | 3 | 2048 | 2048 | 2048 (0)|
| 5 | INLIST ITERATOR | | 1 | | 3 |00:00:00.01 | 3 | | | |
| 6 | TABLE ACCESS BY INDEX ROWID| T2 | 2 | 4 | 3 |00:00:00.01 | 3 | | | |
|* 7 | INDEX RANGE SCAN | T2_I | 2 | 4 | 3 |00:00:00.01 | 2 | | | |
------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

4 - access("T1"."ID"="T2"."T1_ID")
filter("T1"."ID"="T2"."T1_ID")
7 - access(("T2"."STATUS"='x' OR "T2"."STATUS"='y'))


29 rows selected.

Later, I offered an alternative, after realizing I can also use EXISTS in the SELECT clause, like this:

SQL> select /*+ gather_plan_statistics */
2 t1.id
3 , case
4 when exists
5 ( select ''
6 from t2
7 where t2.t1_id = t1.id
8 and t2.status in ('x','y')
9 )
10 then 'yes'
11 else 'no'
12 end exists_t2
13 from t1
14 order by t1.id
15 /

ID EXISTS_T2
----- ----------
1 no
2 yes
3 yes
4 no

4 rows selected.

So, that's the proof that you can use an EXISTS in a SELECT clause. But this query uses 8 consistent gets (6+2):

SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'))
2 /

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------------
SQL_ID b13u3f5m9ybuu, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ t1.id , case
when exists ( select '' from t2 where
t2.t1_id = t1.id and t2.status in ('x','y') )
then 'yes' else 'no' end exists_t2 from t1 order
by t1.id

Plan hash value: 1623664937

-------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 4 |00:00:00.01 | 2 |
| 1 | INLIST ITERATOR | | 4 | | 2 |00:00:00.01 | 6 |
|* 2 | INDEX RANGE SCAN| T2_I | 6 | 1 | 2 |00:00:00.01 | 6 |
| 3 | INDEX FULL SCAN | T1_PK | 1 | 4 | 4 |00:00:00.01 | 2 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access((("T2"."STATUS"='x' OR "T2"."STATUS"='y')) AND
"T2"."T1_ID"=:B1)


25 rows selected.

So not really interesting with this data set, although the plan is shorter in number of operations. With other data distributions it can be useful though. For example, when blowing up the t2 table:

SQL> insert into t2
2 select n * 1000 + t2.id
3 , t2.t1_id
4 , t2.status
5 , t2.filler
6 from t2
7 cross join (select level n from dual connect by level <= 1000)
8 /

8000 rows created.

SQL> exec dbms_stats.gather_table_stats(user,'t2',cascade=>true,method_opt=>'for all columns size auto')

PL/SQL procedure successfully completed.

Now, the left outer join with the distinct has to join 4 rows with 3003 joins, and the resulting SQL uses 3017 consistent gets:

SQL> select /*+ gather_plan_statistics */
2 distinct t1.id
3 , nvl2(t2.id,'yes','no') exists_t2
4 from t1
5 left outer join t2
6 on ( t1.id = t2.t1_id
7 and t2.status in ('x','y')
8 )
9 order by t1.id
10 /

ID EXISTS_T2
----- ----------
1 no
2 yes
3 yes
4 no

4 rows selected.

SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'))
2 /

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 2mtksgp0pzx59, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ distinct t1.id ,
nvl2(t2.id,'yes','no') exists_t2 from t1 left outer join t2
on ( t1.id = t2.t1_id and t2.status in ('x','y')
) order by t1.id

Plan hash value: 1021492722

------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 4 |00:00:00.01 | 3017 | | | |
| 1 | SORT UNIQUE | | 1 | 4 | 4 |00:00:00.01 | 3017 | 2048 | 2048 | 2048 (0)|
| 2 | MERGE JOIN OUTER | | 1 | 4 | 3005 |00:00:00.01 | 3017 | | | |
| 3 | INDEX FULL SCAN | T1_PK | 1 | 4 | 4 |00:00:00.01 | 1 | | | |
|* 4 | SORT JOIN | | 4 | 4 | 3003 |00:00:00.01 | 3016 | 115K| 115K| 102K (0)|
| 5 | INLIST ITERATOR | | 1 | | 3003 |00:00:00.01 | 3016 | | | |
| 6 | TABLE ACCESS BY INDEX ROWID| T2 | 2 | 4 | 3003 |00:00:00.01 | 3016 | | | |
|* 7 | INDEX RANGE SCAN | T2_I | 2 | 4 | 3003 |00:00:00.01 | 13 | | | |
------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

4 - access("T1"."ID"="T2"."T1_ID")
filter("T1"."ID"="T2"."T1_ID")
7 - access(("T2"."STATUS"='x' OR "T2"."STATUS"='y'))


29 rows selected.

And the second query is now a whole lot more efficient with only 15 consistent gets:

SQL> select /*+ gather_plan_statistics */
2 t1.id
3 , case
4 when exists
5 ( select ''
6 from t2
7 where t2.t1_id = t1.id
8 and t2.status in ('x','y')
9 )
10 then 'yes'
11 else 'no'
12 end exists_t2
13 from t1
14 order by t1.id
15 /

ID EXISTS_T2
----- ----------
1 no
2 yes
3 yes
4 no

4 rows selected.

SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'))
2 /

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------------
SQL_ID b13u3f5m9ybuu, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ t1.id , case
when exists ( select '' from t2 where
t2.t1_id = t1.id and t2.status in ('x','y') )
then 'yes' else 'no' end exists_t2 from t1 order
by t1.id

Plan hash value: 1623664937

-------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 4 |00:00:00.01 | 2 |
| 1 | INLIST ITERATOR | | 4 | | 2 |00:00:00.01 | 13 |
|* 2 | INDEX RANGE SCAN| T2_I | 6 | 1 | 2 |00:00:00.01 | 13 |
| 3 | INDEX FULL SCAN | T1_PK | 1 | 4 | 4 |00:00:00.01 | 2 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access((("T2"."STATUS"='x' OR "T2"."STATUS"='y')) AND
"T2"."T1_ID"=:B1)


25 rows selected.

The query selects the four rows of T1 by the INDEX FULL SCAN in line 3. The six starts of line 2 can be explained by the INLIST ITERATOR that splittes the T2 access into one for status = 'x' and one for status = 'y'. It searches for the 'x' for all four rows of T1. For t1.id 2 and 3, it finds an 'x' value, and it doesn't need to look for an 'y' value. Only for t1.id 1 and 4, it needs to search for an 'y' value.

Note that not only the data distribution is a factor, but the index on (t1_id,status) is needed as well to make a difference here.

I guess I just want to say that there is an alternative that can be useful in certain situations, that you might not have thought of initially.




PS: Something completely different: Kamran Agayev has posted an interview with me.

Tuesday, July 12, 2011

Partitioned outer join bug

I have experienced a strange bug when constructing an example showing how a partitioned outer join works. I used a setup which resembles a situation I have consulted about recently for timeseries data. The table can hold multiple timeseries whose data is coming from multiple sources. The data itself contains volumes that are reported for every 5 minutes. This table looks like this:

SQL> create table tradevolumes
2 ( timeseries_id number(4)
3 , tradedate date
4 , source_id number(4)
5 , volume number(4)
6 , constraint tradevolumes_pk primary key (timeseries_id,tradedate,source_id)
7 ) organization index
8 /

Table created.


An index organized table, which we fill with some data: 2 timeseries from 3 sources, and 108 five-minute-periods:

SQL> insert into tradevolumes
2 select 1 + mod(level-1,2)
3 , trunc(sysdate) + numtodsinterval(5*mod(trunc((level-1)/2),108),'minute')
4 , ceil(level/(2 * 108))
5 , ceil(dbms_random.value(0,1000))
6 from dual
7 connect by level <= 2 * 108 * 3
8 /

648 rows created.


A common problem for timeseries data is missing data. I'll simulate that by deleting several rows:

SQL> delete tradevolumes
2 where ( tradedate
3 between trunc(sysdate) + interval '35' minute
4 and trunc(sysdate) + interval '50' minute
5 or (timeseries_id = 1 and source_id = 3 and volume < 30)
6 or (timeseries_id = 2 and source_id = 2 and tradedate > trunc(sysdate) + interval '20' minute)
7 )
8 /

127 rows deleted.


And now gather some basic statistics for the table:

SQL> exec dbms_stats.gather_table_stats(user,'tradevolumes')

PL/SQL procedure successfully completed.


To get an idea of the data that is now in the table:

SQL> select *
2 from tradevolumes
3 order by timeseries_id
4 , tradedate
5 , source_id
6 /

TIMESERIES_ID TRADEDATE SOURCE_ID VOLUME
------------- ------------------- ---------- ----------
1 12-07-2011 00:00:00 1 400
1 12-07-2011 00:00:00 2 265
1 12-07-2011 00:00:00 3 307
1 12-07-2011 00:05:00 1 223
1 12-07-2011 00:05:00 2 975
1 12-07-2011 00:05:00 3 173
1 12-07-2011 00:10:00 1 794
1 12-07-2011 00:10:00 2 17
1 12-07-2011 00:10:00 3 468
1 12-07-2011 00:15:00 1 552
1 12-07-2011 00:15:00 2 979
1 12-07-2011 00:15:00 3 84
1 12-07-2011 00:20:00 1 73
1 12-07-2011 00:20:00 2 710
1 12-07-2011 00:20:00 3 54
1 12-07-2011 00:25:00 1 271
1 12-07-2011 00:25:00 2 643
1 12-07-2011 00:25:00 3 443
1 12-07-2011 00:30:00 1 408
1 12-07-2011 00:30:00 2 526
1 12-07-2011 00:30:00 3 725
1 12-07-2011 00:55:00 1 13
1 12-07-2011 00:55:00 2 943
1 12-07-2011 00:55:00 3 319
... skipped a lot of rows here ...
1 12-07-2011 08:50:00 1 239
1 12-07-2011 08:50:00 2 502
1 12-07-2011 08:50:00 3 297
1 12-07-2011 08:55:00 1 620
1 12-07-2011 08:55:00 2 531
1 12-07-2011 08:55:00 3 377
2 12-07-2011 00:00:00 1 778
2 12-07-2011 00:00:00 2 302
2 12-07-2011 00:00:00 3 959
2 12-07-2011 00:05:00 1 920
2 12-07-2011 00:05:00 2 827
2 12-07-2011 00:05:00 3 395
... skipped a lot of rows here ...
2 12-07-2011 08:45:00 1 495
2 12-07-2011 08:45:00 3 519
2 12-07-2011 08:50:00 1 556
2 12-07-2011 08:50:00 3 193
2 12-07-2011 08:55:00 1 368
2 12-07-2011 08:55:00 3 18

521 rows selected.


To show how a partitioned outer join works, my plan is to add all missing key combinations to the result set for the first hour. So the query only uses the times between 0:00 to 0:55 and we'll ignore the rows between 1:00 and 8:55. This is the data for the first hour:

SQL> select *
2 from tradevolumes
3 where tradedate between trunc(sysdate) and trunc(sysdate) + interval '55' minute
4 order by timeseries_id
5 , tradedate
6 , source_id
7 /

TIMESERIES_ID TRADEDATE SOURCE_ID VOLUME
------------- ------------------- ---------- ----------
1 12-07-2011 00:00:00 1 400
1 12-07-2011 00:00:00 2 265
1 12-07-2011 00:00:00 3 307
1 12-07-2011 00:05:00 1 223
1 12-07-2011 00:05:00 2 975
1 12-07-2011 00:05:00 3 173
1 12-07-2011 00:10:00 1 794
1 12-07-2011 00:10:00 2 17
1 12-07-2011 00:10:00 3 468
1 12-07-2011 00:15:00 1 552
1 12-07-2011 00:15:00 2 979
1 12-07-2011 00:15:00 3 84
1 12-07-2011 00:20:00 1 73
1 12-07-2011 00:20:00 2 710
1 12-07-2011 00:20:00 3 54
1 12-07-2011 00:25:00 1 271
1 12-07-2011 00:25:00 2 643
1 12-07-2011 00:25:00 3 443
1 12-07-2011 00:30:00 1 408
1 12-07-2011 00:30:00 2 526
1 12-07-2011 00:30:00 3 725
1 12-07-2011 00:55:00 1 13
1 12-07-2011 00:55:00 2 943
1 12-07-2011 00:55:00 3 319
2 12-07-2011 00:00:00 1 778
2 12-07-2011 00:00:00 2 302
2 12-07-2011 00:00:00 3 959
2 12-07-2011 00:05:00 1 920
2 12-07-2011 00:05:00 2 827
2 12-07-2011 00:05:00 3 395
2 12-07-2011 00:10:00 1 534
2 12-07-2011 00:10:00 2 156
2 12-07-2011 00:10:00 3 2
2 12-07-2011 00:15:00 1 642
2 12-07-2011 00:15:00 2 278
2 12-07-2011 00:15:00 3 76
2 12-07-2011 00:20:00 1 472
2 12-07-2011 00:20:00 2 210
2 12-07-2011 00:20:00 3 549
2 12-07-2011 00:25:00 1 349
2 12-07-2011 00:25:00 3 332
2 12-07-2011 00:30:00 1 371
2 12-07-2011 00:30:00 3 396
2 12-07-2011 00:55:00 1 408
2 12-07-2011 00:55:00 3 818

45 rows selected.


The 12 five-minute-periods should be present for the 2 timeseries and 3 sources, so we need 72 rows. Now watch the bug with a standard partitioned outer join:

SQL> with all_tradedates as
2 ( select trunc(sysdate) + numtodsinterval(5*(level-1),'minute') tradedate
3 from dual
4 connect by level <= 12
5 )
6 select v.timeseries_id
7 , a.tradedate
8 , v.source_id
9 , v.volume
10 from all_tradedates a
11 left outer join tradevolumes v
12 partition by (v.timeseries_id,v.source_id)
13 on (a.tradedate = v.tradedate)
14 order by v.timeseries_id
15 , a.tradedate
16 , v.source_id
17 /

TIMESERIES_ID TRADEDATE SOURCE_ID VOLUME
------------- ------------------- ---------- ----------
1 12-07-2011 00:00:00 1 400
1 12-07-2011 00:00:00 2 265
1 12-07-2011 00:00:00 3 307
1 12-07-2011 00:05:00 1 400
1 12-07-2011 00:05:00 2 265
1 12-07-2011 00:05:00 3 307
1 12-07-2011 00:10:00 1 400
1 12-07-2011 00:10:00 2 265
1 12-07-2011 00:10:00 3 307
1 12-07-2011 00:15:00 1 400
1 12-07-2011 00:15:00 2 265
1 12-07-2011 00:15:00 3 307
1 12-07-2011 00:20:00 1 400
1 12-07-2011 00:20:00 2 265
1 12-07-2011 00:20:00 3 307
1 12-07-2011 00:25:00 1 400
1 12-07-2011 00:25:00 2 265
1 12-07-2011 00:25:00 3 307
1 12-07-2011 00:30:00 1 400
1 12-07-2011 00:30:00 2 265
1 12-07-2011 00:30:00 3 307
1 12-07-2011 00:35:00 1 400
1 12-07-2011 00:35:00 2 265
1 12-07-2011 00:35:00 3 307
1 12-07-2011 00:40:00 1 400
1 12-07-2011 00:40:00 2 265
1 12-07-2011 00:40:00 3 307
1 12-07-2011 00:45:00 1 400
1 12-07-2011 00:45:00 2 265
1 12-07-2011 00:45:00 3 307
1 12-07-2011 00:50:00 1 400
1 12-07-2011 00:50:00 2 265
1 12-07-2011 00:50:00 3 307
1 12-07-2011 00:55:00 1 400
1 12-07-2011 00:55:00 2 265
1 12-07-2011 00:55:00 3 307
2 12-07-2011 00:00:00 1 778
2 12-07-2011 00:00:00 2 302
2 12-07-2011 00:00:00 3 959
2 12-07-2011 00:05:00 1 778
2 12-07-2011 00:05:00 2 302
2 12-07-2011 00:05:00 3 959
2 12-07-2011 00:10:00 1 778
2 12-07-2011 00:10:00 2 302
2 12-07-2011 00:10:00 3 959
2 12-07-2011 00:15:00 1 778
2 12-07-2011 00:15:00 2 302
2 12-07-2011 00:15:00 3 959
2 12-07-2011 00:20:00 1 778
2 12-07-2011 00:20:00 2 302
2 12-07-2011 00:20:00 3 959
2 12-07-2011 00:25:00 1 778
2 12-07-2011 00:25:00 2 302
2 12-07-2011 00:25:00 3 959
2 12-07-2011 00:30:00 1 778
2 12-07-2011 00:30:00 2 302
2 12-07-2011 00:30:00 3 959
2 12-07-2011 00:35:00 1 778
2 12-07-2011 00:35:00 2 302
2 12-07-2011 00:35:00 3 959
2 12-07-2011 00:40:00 1 778
2 12-07-2011 00:40:00 2 302
2 12-07-2011 00:40:00 3 959
2 12-07-2011 00:45:00 1 778
2 12-07-2011 00:45:00 2 302
2 12-07-2011 00:45:00 3 959
2 12-07-2011 00:50:00 1 778
2 12-07-2011 00:50:00 2 302
2 12-07-2011 00:50:00 3 959
2 12-07-2011 00:55:00 1 778
2 12-07-2011 00:55:00 2 302
2 12-07-2011 00:55:00 3 959

72 rows selected.


The query returns 72 rows as expected, but look at the volume column. When you make up new rows, those new rows should have a null value. And there's more: the rows that existed and weren't made up, now have a volume equaling the volume of "00:00:00" and not their own value...

You might ask: why did you create 108 rows? That's because I've found that when you use 107 rows or less, the bug doesn't appear. With 108 rows or more it creeps up. Here is the same sequence with a table containing 107 rows:

SQL> drop table tradevolumes purge
2 /

Table dropped.

SQL> create table tradevolumes
2 ( timeseries_id number(4)
3 , tradedate date
4 , source_id number(4)
5 , volume number(4)
6 , constraint tradevolumes_pk primary key (timeseries_id,tradedate,source_id)
7 ) organization index
8 /

Table created.

SQL> insert into tradevolumes
2 select 1 + mod(level-1,2)
3 , trunc(sysdate) + numtodsinterval(5*mod(trunc((level-1)/2),107),'minute')
4 , ceil(level/(2 * 107))
5 , ceil(dbms_random.value(0,1000))
6 from dual
7 connect by level <= 2 * 107 * 3
8 /

642 rows created.

SQL> delete tradevolumes
2 where ( tradedate
3 between trunc(sysdate) + interval '35' minute
4 and trunc(sysdate) + interval '50' minute
5 or (timeseries_id = 1 and source_id = 3 and volume < 30)
6 or (timeseries_id = 2 and source_id = 2 and tradedate > trunc(sysdate) + interval '20' minute)
7 )
8 /

128 rows deleted.

SQL> exec dbms_stats.gather_table_stats(user,'tradevolumes')

PL/SQL procedure successfully completed.

SQL> select *
2 from tradevolumes
3 order by timeseries_id
4 , tradedate
5 , source_id
6 /

TIMESERIES_ID TRADEDATE SOURCE_ID VOLUME
------------- ------------------- ---------- ----------
1 12-07-2011 00:00:00 1 159
1 12-07-2011 00:00:00 2 333
1 12-07-2011 00:00:00 3 696
1 12-07-2011 00:05:00 1 674
1 12-07-2011 00:05:00 2 932
... skipped a lot of rows here ...
2 12-07-2011 08:45:00 1 136
2 12-07-2011 08:45:00 3 618
2 12-07-2011 08:50:00 1 705
2 12-07-2011 08:50:00 3 106

514 rows selected.

SQL> with all_tradedates as
2 ( select trunc(sysdate) + numtodsinterval(5*(level-1),'minute') tradedate
3 from dual
4 connect by level <= 12
5 )
6 select v.timeseries_id
7 , a.tradedate
8 , v.source_id
9 , v.volume
10 from all_tradedates a
11 left outer join tradevolumes v
12 partition by (v.timeseries_id,v.source_id)
13 on (a.tradedate = v.tradedate)
14 order by v.timeseries_id
15 , a.tradedate
16 , v.source_id
17 /

TIMESERIES_ID TRADEDATE SOURCE_ID VOLUME
------------- ------------------- ---------- ----------
1 12-07-2011 00:00:00 1 159
1 12-07-2011 00:00:00 2 333
1 12-07-2011 00:00:00 3 696
1 12-07-2011 00:05:00 1 674
1 12-07-2011 00:05:00 2 932
1 12-07-2011 00:05:00 3
1 12-07-2011 00:10:00 1 315
1 12-07-2011 00:10:00 2 872
1 12-07-2011 00:10:00 3 778
1 12-07-2011 00:15:00 1 398
1 12-07-2011 00:15:00 2 272
1 12-07-2011 00:15:00 3 239
1 12-07-2011 00:20:00 1 546
1 12-07-2011 00:20:00 2 863
1 12-07-2011 00:20:00 3 839
1 12-07-2011 00:25:00 1 558
1 12-07-2011 00:25:00 2 576
1 12-07-2011 00:25:00 3 296
1 12-07-2011 00:30:00 1 767
1 12-07-2011 00:30:00 2 957
1 12-07-2011 00:30:00 3 935
1 12-07-2011 00:35:00 1
1 12-07-2011 00:35:00 2
1 12-07-2011 00:35:00 3
1 12-07-2011 00:40:00 1
1 12-07-2011 00:40:00 2
1 12-07-2011 00:40:00 3
1 12-07-2011 00:45:00 1
1 12-07-2011 00:45:00 2
1 12-07-2011 00:45:00 3
1 12-07-2011 00:50:00 1
1 12-07-2011 00:50:00 2
1 12-07-2011 00:50:00 3
1 12-07-2011 00:55:00 1 398
1 12-07-2011 00:55:00 2 110
1 12-07-2011 00:55:00 3 629
2 12-07-2011 00:00:00 1 615
2 12-07-2011 00:00:00 2 924
2 12-07-2011 00:00:00 3 526
2 12-07-2011 00:05:00 1 147
2 12-07-2011 00:05:00 2 589
2 12-07-2011 00:05:00 3 779
2 12-07-2011 00:10:00 1 554
2 12-07-2011 00:10:00 2 108
2 12-07-2011 00:10:00 3 9
2 12-07-2011 00:15:00 1 601
2 12-07-2011 00:15:00 2 111
2 12-07-2011 00:15:00 3 526
2 12-07-2011 00:20:00 1 418
2 12-07-2011 00:20:00 2 818
2 12-07-2011 00:20:00 3 578
2 12-07-2011 00:25:00 1 824
2 12-07-2011 00:25:00 2
2 12-07-2011 00:25:00 3 112
2 12-07-2011 00:30:00 1 649
2 12-07-2011 00:30:00 2
2 12-07-2011 00:30:00 3 689
2 12-07-2011 00:35:00 1
2 12-07-2011 00:35:00 2
2 12-07-2011 00:35:00 3
2 12-07-2011 00:40:00 1
2 12-07-2011 00:40:00 2
2 12-07-2011 00:40:00 3
2 12-07-2011 00:45:00 1
2 12-07-2011 00:45:00 2
2 12-07-2011 00:45:00 3
2 12-07-2011 00:50:00 1
2 12-07-2011 00:50:00 2
2 12-07-2011 00:50:00 3
2 12-07-2011 00:55:00 1 651
2 12-07-2011 00:55:00 2
2 12-07-2011 00:55:00 3 641

72 rows selected.


And here the results are good. And when I add the rows for 08:55:00 afterwards:

SQL> insert into tradevolumes
2 select timeseries_id
3 , tradedate + interval '5' minute
4 , source_id
5 , 1000
6 from tradevolumes
7 where tradedate = (select max(tradedate) from tradevolumes)
8 /

4 rows created.

SQL> commit
2 /

Commit complete.

SQL> exec dbms_stats.gather_table_stats(user,'tradevolumes')

PL/SQL procedure successfully completed.

SQL> with all_tradedates as
2 ( select trunc(sysdate) + numtodsinterval(5*(level-1),'minute') tradedate
3 from dual
4 connect by level <= 12
5 )
6 select v.timeseries_id
7 , a.tradedate
8 , v.source_id
9 , v.volume
10 from all_tradedates a
11 left outer join tradevolumes v
12 partition by (v.timeseries_id,v.source_id)
13 on (a.tradedate = v.tradedate)
14 order by v.timeseries_id
15 , a.tradedate
16 , v.source_id
17 /

TIMESERIES_ID TRADEDATE SOURCE_ID VOLUME
------------- ------------------- ---------- ----------
1 12-07-2011 00:00:00 1 159
1 12-07-2011 00:00:00 2 333
1 12-07-2011 00:00:00 3 696
1 12-07-2011 00:05:00 1 674
1 12-07-2011 00:05:00 2 932
1 12-07-2011 00:05:00 3
1 12-07-2011 00:10:00 1 315
1 12-07-2011 00:10:00 2 872
1 12-07-2011 00:10:00 3 778
1 12-07-2011 00:15:00 1 398
1 12-07-2011 00:15:00 2 272
1 12-07-2011 00:15:00 3 239
1 12-07-2011 00:20:00 1 546
1 12-07-2011 00:20:00 2 863
1 12-07-2011 00:20:00 3 839
1 12-07-2011 00:25:00 1 558
1 12-07-2011 00:25:00 2 576
1 12-07-2011 00:25:00 3 296
1 12-07-2011 00:30:00 1 767
1 12-07-2011 00:30:00 2 957
1 12-07-2011 00:30:00 3 935
1 12-07-2011 00:35:00 1
1 12-07-2011 00:35:00 2
1 12-07-2011 00:35:00 3
1 12-07-2011 00:40:00 1
1 12-07-2011 00:40:00 2
1 12-07-2011 00:40:00 3
1 12-07-2011 00:45:00 1
1 12-07-2011 00:45:00 2
1 12-07-2011 00:45:00 3
1 12-07-2011 00:50:00 1
1 12-07-2011 00:50:00 2
1 12-07-2011 00:50:00 3
1 12-07-2011 00:55:00 1 398
1 12-07-2011 00:55:00 2 110
1 12-07-2011 00:55:00 3 629
2 12-07-2011 00:00:00 1 615
2 12-07-2011 00:00:00 2 924
2 12-07-2011 00:00:00 3 526
2 12-07-2011 00:05:00 1 147
2 12-07-2011 00:05:00 2 589
2 12-07-2011 00:05:00 3 779
2 12-07-2011 00:10:00 1 554
2 12-07-2011 00:10:00 2 108
2 12-07-2011 00:10:00 3 9
2 12-07-2011 00:15:00 1 601
2 12-07-2011 00:15:00 2 111
2 12-07-2011 00:15:00 3 526
2 12-07-2011 00:20:00 1 418
2 12-07-2011 00:20:00 2 818
2 12-07-2011 00:20:00 3 578
2 12-07-2011 00:25:00 1 824
2 12-07-2011 00:25:00 2
2 12-07-2011 00:25:00 3 112
2 12-07-2011 00:30:00 1 649
2 12-07-2011 00:30:00 2
2 12-07-2011 00:30:00 3 689
2 12-07-2011 00:35:00 1
2 12-07-2011 00:35:00 2
2 12-07-2011 00:35:00 3
2 12-07-2011 00:40:00 1
2 12-07-2011 00:40:00 2
2 12-07-2011 00:40:00 3
2 12-07-2011 00:45:00 1
2 12-07-2011 00:45:00 2
2 12-07-2011 00:45:00 3
2 12-07-2011 00:50:00 1
2 12-07-2011 00:50:00 2
2 12-07-2011 00:50:00 3
2 12-07-2011 00:55:00 1 651
2 12-07-2011 00:55:00 2
2 12-07-2011 00:55:00 3 641

72 rows selected.


And now it works!

And using 108 rows with a regular heap table also isn't a problem:

SQL> drop table tradevolumes purge
2 /

Table dropped.

SQL> create table tradevolumes
2 ( timeseries_id number(4)
3 , tradedate date
4 , source_id number(4)
5 , volume number(4)
6 , constraint tradevolumes_pk primary key (timeseries_id,tradedate,source_id)
7 )
8 /

Table created.

SQL> insert into tradevolumes
2 select 1 + mod(level-1,2)
3 , trunc(sysdate) + numtodsinterval(5*mod(trunc((level-1)/2),108),'minute')
4 , ceil(level/(2 * 108))
5 , ceil(dbms_random.value(0,1000))
6 from dual
7 connect by level <= 2 * 108 * 3
8 /

648 rows created.

SQL> delete tradevolumes
2 where ( tradedate
3 between trunc(sysdate) + interval '35' minute
4 and trunc(sysdate) + interval '50' minute
5 or (timeseries_id = 1 and source_id = 3 and volume < 30)
6 or (timeseries_id = 2 and source_id = 2 and tradedate > trunc(sysdate) + interval '20' minute)
7 )
8 /

126 rows deleted.

SQL> exec dbms_stats.gather_table_stats(user,'tradevolumes')

PL/SQL procedure successfully completed.

SQL> select *
2 from tradevolumes
3 order by timeseries_id
4 , tradedate
5 , source_id
6 /

TIMESERIES_ID TRADEDATE SOURCE_ID VOLUME
------------- ------------------- ---------- ----------
1 12-07-2011 00:00:00 1 200
1 12-07-2011 00:00:00 2 157
1 12-07-2011 00:00:00 3 950
1 12-07-2011 00:05:00 1 707
1 12-07-2011 00:05:00 2 932
1 12-07-2011 00:05:00 3 258
... skipped a lot of rows here ...
2 12-07-2011 08:50:00 1 61
2 12-07-2011 08:50:00 3 974
2 12-07-2011 08:55:00 1 692
2 12-07-2011 08:55:00 3 714

522 rows selected.

SQL> with all_tradedates as
2 ( select trunc(sysdate) + numtodsinterval(5*(level-1),'minute') tradedate
3 from dual
4 connect by level <= 12
5 )
6 select v.timeseries_id
7 , a.tradedate
8 , v.source_id
9 , v.volume
10 from all_tradedates a
11 left outer join tradevolumes v
12 partition by (v.timeseries_id,v.source_id)
13 on (a.tradedate = v.tradedate)
14 order by v.timeseries_id
15 , a.tradedate
16 , v.source_id
17 /

TIMESERIES_ID TRADEDATE SOURCE_ID VOLUME
------------- ------------------- ---------- ----------
1 12-07-2011 00:00:00 1 200
1 12-07-2011 00:00:00 2 157
1 12-07-2011 00:00:00 3 950
1 12-07-2011 00:05:00 1 707
1 12-07-2011 00:05:00 2 932
1 12-07-2011 00:05:00 3 258
1 12-07-2011 00:10:00 1 586
1 12-07-2011 00:10:00 2 193
1 12-07-2011 00:10:00 3 753
1 12-07-2011 00:15:00 1 287
1 12-07-2011 00:15:00 2 420
1 12-07-2011 00:15:00 3 113
1 12-07-2011 00:20:00 1 971
1 12-07-2011 00:20:00 2 93
1 12-07-2011 00:20:00 3 967
1 12-07-2011 00:25:00 1 685
1 12-07-2011 00:25:00 2 63
1 12-07-2011 00:25:00 3 832
1 12-07-2011 00:30:00 1 471
1 12-07-2011 00:30:00 2 764
1 12-07-2011 00:30:00 3 277
1 12-07-2011 00:35:00 1
1 12-07-2011 00:35:00 2
1 12-07-2011 00:35:00 3
1 12-07-2011 00:40:00 1
1 12-07-2011 00:40:00 2
1 12-07-2011 00:40:00 3
1 12-07-2011 00:45:00 1
1 12-07-2011 00:45:00 2
1 12-07-2011 00:45:00 3
1 12-07-2011 00:50:00 1
1 12-07-2011 00:50:00 2
1 12-07-2011 00:50:00 3
1 12-07-2011 00:55:00 1 332
1 12-07-2011 00:55:00 2 909
1 12-07-2011 00:55:00 3 873
2 12-07-2011 00:00:00 1 865
2 12-07-2011 00:00:00 2 982
2 12-07-2011 00:00:00 3 290
2 12-07-2011 00:05:00 1 936
2 12-07-2011 00:05:00 2 651
2 12-07-2011 00:05:00 3 141
2 12-07-2011 00:10:00 1 843
2 12-07-2011 00:10:00 2 818
2 12-07-2011 00:10:00 3 218
2 12-07-2011 00:15:00 1 85
2 12-07-2011 00:15:00 2 530
2 12-07-2011 00:15:00 3 650
2 12-07-2011 00:20:00 1 475
2 12-07-2011 00:20:00 2 988
2 12-07-2011 00:20:00 3 771
2 12-07-2011 00:25:00 1 422
2 12-07-2011 00:25:00 2
2 12-07-2011 00:25:00 3 48
2 12-07-2011 00:30:00 1 88
2 12-07-2011 00:30:00 2
2 12-07-2011 00:30:00 3 991
2 12-07-2011 00:35:00 1
2 12-07-2011 00:35:00 2
2 12-07-2011 00:35:00 3
2 12-07-2011 00:40:00 1
2 12-07-2011 00:40:00 2
2 12-07-2011 00:40:00 3
2 12-07-2011 00:45:00 1
2 12-07-2011 00:45:00 2
2 12-07-2011 00:45:00 3
2 12-07-2011 00:50:00 1
2 12-07-2011 00:50:00 2
2 12-07-2011 00:50:00 3
2 12-07-2011 00:55:00 1 793
2 12-07-2011 00:55:00 2
2 12-07-2011 00:55:00 3 601

72 rows selected.


The above was all executed in version 11.2.0.2 and I've also tried it on 10.2.0.4 and I witnessed exactly the same phenomenon there. The only difference is that the boundary value is not 107/108 but 80/81.

I did some more basic investigations regarding space occupation and statistics, but I haven't found a clue so far. I'm wondering if I'm missing something obvious. So, if you have some more information about affected versions or the cause, I'd like to know.

Monday, July 11, 2011

Tutorials zur SQL Model Clause

Marcus Matzberger has translated five of my posts related to the SQL Model Clause for the German speaking Oracle community. Those posts now contain links to the German versions and I've listed them here for convenience as well.

SQL Model Clause Tutorial Part One

SQL Model Clause Tutorial Part Two

SQL Model Clause Tutorial Part Three

Calculating probabilities with N throws of a die

Choosing Between SQL and PL/SQL

Thanks Marcus!

Saturday, June 11, 2011

Back from Tallinn, Estonia

This morning I arrived back from a trip to Tallinn. Oracle Estonia had given me the opportunity to present my SQL Masterclass seminar at their training center in Tallinn, on Thursday and Friday. Thank you all those who spent two days hearing me. Here is a short story about my trip including some photos.

I arrived at Tallinn airport around 1PM local time on Wednesday. My hotel room was located at the 18th floor and I had this splendid view on the old city of Tallinn:



I spent the rest of the afternoon taking a city tour by bus, learning a few things about the history of Tallinn and strolling for an hour or two through the old city. Here are a few of the photos that I took during that stroll:












The seminar itself was on Thursday and Friday. I had spent a huge amount of time preparing the seminar. One of my goals was to present as many recognizable situations as possible that the attendees can apply at their daily job, so I was very pleased to receive some feedback that they indeed found several immediately applicable queries. There is still room for improvement though as a few parts were too short (I had planned one-and-a-half hour per part) and I had an error or two in my scripts that I had overlooked. Hopefully I will get a chance to fix those and present my seminar again somewhere. The feeling at the end was very positive though. The powerpoint and the scripts can be found at the "Presentations and papers" page.

This is a picture of the building where Oracle Estonia and its training center are located:



PS: As expected, during the preparation of the seminar I learnt a couple of new things. A bug with partitioned outer joins and something about connect-by-filtering. I will blog about them soon.

Thursday, May 26, 2011

Instrumentation packages

At CIBER we are doing a series of knowledge sessions where several colleagues are doing a one-hour presentation about one or two package of their choice. We have already heard several very interesting presentations about DBMS_HPROF, DBMS_PARALLEL_EXECUTE, UTL_DBWS, UTL_TCP and DBMS_FGA. And this evening a colleague will do a presentation about DBMS_PREPROCESSOR and DBMS_WARNING, and I will do one about instrumentation related packages LOGGER and DBMS_APPLICATION_INFO. I have added my presentation and the scripts to my Presentations and Papers page, and for convenience here as well:

Keynote presentation instrumentation packages

Powerpoint presentation instrumentation packages

Demonstration scripts instrumentation packages

Warning: they are in Dutch.

EDIT: I just received a message that the sessions will be rescheduled to a later date because of a serious number of cancellations. Because the preparation is finished, I have decided to publish them anyway.

Wednesday, March 23, 2011

OGh APEX day 2011

Yesterday I attended OGh's APEX day. Previous year was already pretty successful with 150 people attending. This year the maximum of 150 people was reached in only a few days, so we scaled up the event to a maximum of 250 people, and in the end we still had to disappoint several people. And this year we had several sponsors as well: Oracle, Ciber, Transfer Solutions and Itium. These must be signs that APEX is becoming increasingly popular here in the Netherlands.

The day started with three plenary sessions. The first one was by Tony Andrews and Nigel Blair who told us about their huge APEX project for Northgate Revenues and Benefits. They sell software to local counsils and their application contained approximately 1500 Forms. They had moved from character mode, to Oracle Forms to Oracle Webforms with their application. Their users complained about Webforms of which the user interface deviated too much from their normal internet experience. The sales win rate dropped from 1 out of 2 to 1 out of 4 or less. With their new APEX application they focussed on the user interface again and are now back at winning 50 to 60% of the deals. It was also good to see how Tony solved several technical problems of which some are now part of APEX 4. This session was a great show case that APEX is certainly very much capable for large, enterprise level applications.

The next session was Hilary Farrell's session about APEX 4.1. Hilary is part of the APEX development team. It always amazes me how much great work this team gets done with such a relatively small team. The last years I think I've seen almost every team member present somewhere: all highly competent and passionate people. And Hilary is no exception. Lots of small improvements are on their way in APEX 4.1, of which I think I'm going to like the improved error handling the most. You can read more about the new version in this statement of direction.

Next up were Dimitri Gielis and John Scott with "APEX Evangelists way of working in APEX Projects". They showed how they handled several aspects of a new APEX project. From Ant scripts to generate a fresh projectstructure in Subversion. Continuous integration, their "Glitchi" project management APEX-tool and Balsamiq, with which they design the user interfaces together with the client.

In the afternoon I saw Jasper Alblas' session demonstrating Eneco's new debtor application. I have worked for more than six years at Eneco, so I was eager to see this session. Very informative session and a nice looking application. Well done, Jasper!

The second afternoon session I attended was John Scott's "Building APEX 4 Plugins". I always love John's presentations. He always presents with humour and makes seemingly complex subjects appear simple so you'll end up thinking "hey, I can do that myself without too much ado". I especially liked his Google Translate demos.

The last session I attended was Hans Wijnacker's Golfscores.eu session. Again a well delivered session about an APEX-application that keeps track of golfscores at tournaments. If you are organizing a golf tournament, you can hire them to take care of the IT. I saw Hans faced some similar challenges that I've faced with my own APEX application for Tourpoules (currently offline by the way, waiting for 11g XE to be released ...).

I had a great time and I want to thank OGh and especially Natalie Rohlof, who organized the event, for the fantastic day. I hope Learco and me will get the chance to setup a program for a third edition in 2012.

Thursday, March 17, 2011

Open cursor paranoia

Most PL/SQL developers will likely have witnessed this phenomenon several times during their career. But only in other people's code, of course :-). I'm talking about PL/SQL code where every program unit ends like this:

exception
when others then
if c%isopen
then
close c;
end if;
raise;
end;

where lines 3 to 6 are repeated for every cursor in the block above.

Proponents of open cursor paranoia justify it as defensive programming. Even when you open-fetch-close a cursor, the fetch could raise an exception and the close statement would not execute, leaving the cursor open, they say. A next execution would raise the dreaded "ORA-06511: PL/SQL: cursor already open".

Let's examine this claim. First by defining about which type of cursor we're talking about, since the term "cursor" is an overloaded one. Here I am talking about explicit session cursors in PL/SQL. The ones you declare and manage yourself. Now suppose you have this package:

SQL> create package pkg
2 as
3 procedure p;
4 end pkg;
5 /

Package created.

SQL> create package body pkg
2 as
3 procedure p
4 is
5 cursor c is select dummy from dual;
6 begin
7 open c;
8 end p;
9 end pkg;
10 /

Package body created.


What will the next piece of code do? Will it raise an ORA-06511: cursor already open?

SQL> begin
2 pkg.p;
3 pkg.p;
4 end;
5 /

No it doesn't:

PL/SQL procedure successfully completed.

An explicit cursor variable has a scope, just like every other variable. Its scope here is the procedure. When the procedure ends, the cursor variable is gone and you can safely do an "open c" again. Note that PL/SQL has a PL/SQL cursor cache which keeps the cursor silently open in the background, so that a reopen of this cursor will not cause a reparse, not even a soft parse:

SQL> select cursor_type
2 from v$open_cursor
3 where sql_text = 'SELECT DUMMY FROM DUAL'
4 /

CURSOR_TYPE
----------------------------------------------------------------
PL/SQL CURSOR CACHED

1 row selected.

By the way, in 11g, Oracle has added 3 extra columns to the V$OPEN_CURSOR view. The new column CURSOR_TYPE is very informative as you can see above.

The cursor type is PL/SQL CURSOR CACHED, because the SESSION_CACHED_CURSORS parameter is on its default of 50 and the cursor was closed in the PL/SQL code. When you switch this parameter off by setting it to 0, the entry disappears from V$OPEN_CURSOR, and the cursor is closed in the background as well:

SQL> alter session set session_cached_cursors = 0
2 /

Session altered.

SQL> begin
2 pkg.p;
3 end;
4 /

PL/SQL procedure successfully completed.

SQL> select cursor_type
2 from v$open_cursor
3 where sql_text = 'SELECT DUMMY FROM DUAL'
4 /

no rows selected

SQL> alter session set session_cached_cursors = 50
2 /

Session altered.

Is there really no way that closing open cursors in the exception handler is useful? Maybe it once made sense in an earlier version? If you know this was the case once, please leave a comment. And it could be useful when the scope of the cursor is bigger than just the procedure, for example when it's the declared globally in the package. In this case the package state holds the state of the cursor variable, so with this package:

SQL> create or replace package body pkg
2 as
3 cursor c is select dummy from dual
4 ;
5 procedure p
6 is
7 begin
8 open c;
9 end p;
10 end pkg;
11 /

Package body created.

, you do get the ORA-06511:

SQL> begin
2 pkg.p;
3 pkg.p;
4 end;
5 /
begin
*
ERROR at line 1:
ORA-06511: PL/SQL: cursor already open
ORA-06512: at "RWIJK.PKG", line 3
ORA-06512: at "RWIJK.PKG", line 8
ORA-06512: at line 3

And the cursor_type attribute of v$open_cursor now says:

SQL> select cursor_type
2 from v$open_cursor
3 where sql_text = 'SELECT DUMMY FROM DUAL '
4 /

CURSOR_TYPE
----------------------------------------------------------------
OPEN-PL/SQL

1 row selected.

But this raises the question why you would ever want to have a globally defined cursor. Recently I've seen more than one application with so called cursor packages. Packages containing only cursors in their specification, as helper packages for the other non cursor packages in the application. A truly bad idea, not only for not hiding the implementation details from the caller, but also because once the package is called, the package has a state and the cursor variables don't get closed automatically, making the application vulnerable for ORA-06511's. Although, I can imagine a cursor package in a situation where a mid-tier application wants to handle record sets without having to deal with the SQL. I don't have experience with this scenario myself though.

Odds are that when you see "if c%isopen then close c; end if;" constructs being used abundantly throughout the code, most of them can happily be removed.