At CIBER we are very proud to announce that Cary Millsap will give his one day seminar called Mastering Oracle Trace Data in the Netherlands. The event will take place at the Carlton President Hotel at Utrecht on Wednesday, May 23. You can register and read more about this event here.
The seminar is aimed at DBA's, database application developers, data warehouse specialists and anyone caring about the speed of a database application. I'm sure that, among many other things, you'll leave the seminar with a very clear mindset about performance. If you are in doubt whether you should come or not, please read one of his many excellent papers, Thinking Clearly About Performance.
It's an opportunity you don't want to miss. Hope to see you there soon!
Pages
▼
Wednesday, March 28, 2012
Tuesday, March 27, 2012
Third OGh APEX dag
Yesterday was the third annual APEX day, organized by the Dutch Oracle usergroup OGh. It's the biggest APEX only event in the world, I've been told, with approximately 280 attendees. Learco Brizzi, Marti Koppelmans and myself were very proud to again have a great lineup of presenters and presentations.
The day started with a keynote by Patrick Wolf telling about and showing a lot of new 4.2 features. Of course, he could not be sure if every single feature will eventually make it to this 4.2 release. The APEX team focussed on mobile development. I was mostly impressed by the demo showing how relatively easy it will be to develop a great looking iPhone application. A query, select a list type and voila. And I think this is why the Oracle community likes APEX so much: it makes developing applications easy. It reminded me of this presentation by Kathy Sierra: it makes you feel awesome.
Next up were the three parallel tracks. I heard quite a lot of people saying they were having a hard time making choices between great presentations, and I was no exception to that rule.
I saw Roel Hartman's "5 Cool Things you can do with HTML 5". Great presentation, especially the first real slide :-), loved the websockets demo with beaconpush, and I learned a lot.
Next presentation was John Scott's "APEX 4 - Error Handling enhancements". Having seen quite a number of John's presentations in the past, I know he always takes a subject to the next level with last years translation plugin as a highlight. This time he showed how easy it is to automatically log any errors in JIRA, so users don't have to report errors anymore: you already know it. Unfortunately, his last demo didn't work. He wanted to show how to a add a screenshot from the users browser automatically to the JIRA ticket.
The fourth presentation was by Sergei Martens called "Building desktop-like web applications with Ext JS & Apex". He was very enthusiastic about APEX applications using Ext JS, but he didn't succeed completely in explaining to me what aspects he liked so much. I mean, I agree some of the builtin themes are quite boring, but I don't think the video showed a "sexy" theme. To me, a great design is a simple design, but that's a matter of personal preference. Next, he had some severe troubles with a very slow Windows (a pleonasm?) laptop. This must be a presenters' nightmare, especially since he seemed well prepared, but this was out of his control. He handled the situation remarkably well though.
The last presentation of the day was Margreet den Hartigh's and Alex Nuijten's presentation called "From Zero to APEX". A customer story about a Uniface application that was rebuild using APEX, with a team of Uniface developers that not only had to learn APEX, but also PL/SQL, HTML, CSS, Subversion, Javascript and a lot more. Almost all Dutch municipalities use this application. It looks like they can win a lot more by integrating the databases of all municipalities to a single hosted database, with some partitioning, VPD and resource manager. The cost of ownership will decrease dramatically that way. Free tip from me :-)
A nice dinner with several visitors and presenters ended a great APEX day.
The day started with a keynote by Patrick Wolf telling about and showing a lot of new 4.2 features. Of course, he could not be sure if every single feature will eventually make it to this 4.2 release. The APEX team focussed on mobile development. I was mostly impressed by the demo showing how relatively easy it will be to develop a great looking iPhone application. A query, select a list type and voila. And I think this is why the Oracle community likes APEX so much: it makes developing applications easy. It reminded me of this presentation by Kathy Sierra: it makes you feel awesome.
Next up were the three parallel tracks. I heard quite a lot of people saying they were having a hard time making choices between great presentations, and I was no exception to that rule.
I saw Roel Hartman's "5 Cool Things you can do with HTML 5". Great presentation, especially the first real slide :-), loved the websockets demo with beaconpush, and I learned a lot.
Next presentation was John Scott's "APEX 4 - Error Handling enhancements". Having seen quite a number of John's presentations in the past, I know he always takes a subject to the next level with last years translation plugin as a highlight. This time he showed how easy it is to automatically log any errors in JIRA, so users don't have to report errors anymore: you already know it. Unfortunately, his last demo didn't work. He wanted to show how to a add a screenshot from the users browser automatically to the JIRA ticket.
The fourth presentation was by Sergei Martens called "Building desktop-like web applications with Ext JS & Apex". He was very enthusiastic about APEX applications using Ext JS, but he didn't succeed completely in explaining to me what aspects he liked so much. I mean, I agree some of the builtin themes are quite boring, but I don't think the video showed a "sexy" theme. To me, a great design is a simple design, but that's a matter of personal preference. Next, he had some severe troubles with a very slow Windows (a pleonasm?) laptop. This must be a presenters' nightmare, especially since he seemed well prepared, but this was out of his control. He handled the situation remarkably well though.
The last presentation of the day was Margreet den Hartigh's and Alex Nuijten's presentation called "From Zero to APEX". A customer story about a Uniface application that was rebuild using APEX, with a team of Uniface developers that not only had to learn APEX, but also PL/SQL, HTML, CSS, Subversion, Javascript and a lot more. Almost all Dutch municipalities use this application. It looks like they can win a lot more by integrating the databases of all municipalities to a single hosted database, with some partitioning, VPD and resource manager. The cost of ownership will decrease dramatically that way. Free tip from me :-)
A nice dinner with several visitors and presenters ended a great APEX day.
Sunday, March 25, 2012
Connect By Filtering
A hierarchical query is typically executed using a plan that starts with the operation CONNECT BY WITH FILTERING, which has two child operations. The first child operation implements the START WITH clause and the second child operation contains a step called CONNECT BY PUMP, implementing the recursive part of the query. Here is an example of such a plan using the well known hierarchical query on table EMP:
You can see a great and more detailed explanation of connect by with filtering here on Christian Antognini's blog.
When I was researching the new recursive subquery factoring clause one and a half year ago, and compared a standard hierarchical query on EMP using recursive subquery factoring with a query using the good old connect by, I stumbled upon a new optimizer algorithm for implementing recursive queries:
You might wonder what I did to make two exactly the same queries to use a different execution plan, but I'll address that later. First, I'd like to show there are two optimizer hints available, with which you can control which algorithm the optimizer uses:
And this was surprising to me. As the version column suggests, the no_connect_by_filtering hint and the accompanying new algorithm were already introduced in version 10.2.0.2! I checked with my old 10.2.0.4 database and it is indeed present and can be used there:
But you need the no_connect_by_filtering hint in version 10.2.0.4 for this query. If you do not provide the hint, this is the result:
Which explains why I didn't see the CONNECT BY NO FILTERING WITH START-WITH earlier. It seems that Oracle has adjusted the cost calculation of connect by queries somewhere between 10.2.0.4 and 11.2.0.1. Just look at the cost from both execution plans on 10.2.0.4 using a regular explain plan statement and a "select * from table(dbms_xplan.display):
The cost of 3 is due to the full table scan of EMP, and no additional cost is added for the hierarchical query.
These are the plans from 11.2.0.2:
The numbers from the 11.2.0.2 show more sophistication than just the cost of the table scan. The optimizer can't know how many levels deep the data is, but version 10.2.0.4 apparently picked 1, and left the total cost unchanged from 3 to 3. I'm curious to know in which version in between 10.2.0.4 and 11.2.0.2 this cost calculation changed. If anyone who is reading this, has a version in between and likes to check, please let me know in the comments. My guess would be that 11.2.0.1 contained the cost change.
What does CONNECT BY NO FILTERING WITH START-WITH do?
Let's explore this, using this table:
The data is tree shaped where each parent node has exactly 9 child nodes. One tenth of the data, with an id that ends with the digit 3, has its indicator column set to 'N'. This select query will make it clearer how the data looks like:
When hearing the word "filter", I almost immediately associate it with a WHERE clause. But a where clause in a connect by query, is not what is meant by connect by filtering. The documentation states:
So a where clause predicate is evaluated AFTER the connect by has done its job. You can see that happening here:
The "indicator = 'N'" predicate is at step 1, which is executed after the CONNECT BY WITH FILTERING at step 2. Note that although this query is executed in 11.2.0.2, the optimizer has chosen the old CONNECT BY WITH FILTERING.
Connect by filtering is done by using filters in your CONNECT BY clause. Here is an example using the predicate "indicator = 'N'" inside the CONNECT BY clause:
In the A-rows column, you can see that the connect by filtering was effective here. Only the necessary rows were being read. And this is the key difference between the two connect by algorithms: with CONNECT BY WITH FILTERING, you can filter within each recursion, whereas CONNECT BY NO FILTERING WITH START-WITH has to read everything, does an in-memory operation, and return the result. With this example, the latter is much less efficient:
100K rows were being read, and the A-time was 0.14 seconds instead of 0.01 seconds. I wondered where those 0.14 seconds went to, since the plan shows it's NOT for the full table scan. Using Tom Kyte's runstats_pkg reveals this:
The major difference is the number of rows sorted! The CONNECT BY NO FILTERING WITH START-WITH sorts all 100K rows. This is a surprise, because normally when you sort, you use memory from the PGA workarea, which shows up in your memory statistics from your execution plan. But the no filtering plan did not show those statistics (OMem, 1Mem, Used-Mem). I have no explanation for this phenomenon yet.
Let's zoom in on the sorting:
So CONNECT BY WITH FILTERING did 8 sorts (2286 - 2278) and sorted 12 rows (9425522 - 9425510), whereas CONNECT BY NO FILTERING WITH START-WITH did 2 (2288 - 2286) sorts and sorted 100,001 rows (9525523 - 9425522).
And finally, I promised to explain why the first two queries of this blogpost are identical, but show a different execution plan. The reason is simple: the first one is executed on 10.2.0.4 and the second one on 11.2.0.2.
SQL> select lpad(' ', 2 * level - 2, ' ') || ename as ename
2 , level
3 , job
4 , deptno
5 from emp
6 connect by mgr = prior empno
7 start with mgr is null
8 /
ENAME LEVEL JOB DEPTNO
-------------------- ---------- --------------------------- ----------
KING 1 PRESIDENT 10
JONES 2 MANAGER 20
SCOTT 3 ANALYST 20
ADAMS 4 CLERK 20
FORD 3 ANALYST 20
SMITH 4 CLERK 20
BLAKE 2 MANAGER 30
ALLEN 3 SALESMAN 30
WARD 3 SALESMAN 30
MARTIN 3 SALESMAN 30
TURNER 3 SALESMAN 30
JAMES 3 CLERK 30
CLARK 2 MANAGER 10
MILLER 3 CLERK 10
14 rows selected.
SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'))
2 /
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------------
SQL_ID d2c7xqxbr112u, child number 0
-------------------------------------
select lpad(' ', 2 * level - 2, ' ') || ename as ename , level , job , deptno from emp connect by
mgr = prior empno start with mgr is null
Plan hash value: 1869448388
--------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | OMem | 1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------
|* 1 | CONNECT BY WITH FILTERING| | 1 | | 14 |00:00:00.02 | 15 | 6 | 2048 | 2048 | 2048 (0)|
|* 2 | TABLE ACCESS FULL | EMP | 1 | 1 | 1 |00:00:00.01 | 3 | 6 | | | |
|* 3 | HASH JOIN | | 4 | | 13 |00:00:00.01 | 12 | 0 | 1452K| 1452K| 853K (0)|
| 4 | CONNECT BY PUMP | | 4 | | 14 |00:00:00.01 | 0 | 0 | | | |
| 5 | TABLE ACCESS FULL | EMP | 4 | 2 | 56 |00:00:00.01 | 12 | 0 | | | |
--------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("MGR"=PRIOR NULL)
2 - filter("MGR" IS NULL)
3 - access("MGR"=PRIOR NULL)
24 rows selected.
You can see a great and more detailed explanation of connect by with filtering here on Christian Antognini's blog.
When I was researching the new recursive subquery factoring clause one and a half year ago, and compared a standard hierarchical query on EMP using recursive subquery factoring with a query using the good old connect by, I stumbled upon a new optimizer algorithm for implementing recursive queries:
SQL> select lpad(' ', 2 * level - 2, ' ') || ename as ename
2 , level
3 , job
4 , deptno
5 from emp
6 connect by mgr = prior empno
7 start with mgr is null
8 /
ENAME LEVEL JOB DEPTNO
-------------------- ---------- --------- ----------
KING 1 PRESIDENT 10
JONES 2 MANAGER 20
SCOTT 3 ANALYST 20
ADAMS 4 CLERK 20
FORD 3 ANALYST 20
SMITH 4 CLERK 20
BLAKE 2 MANAGER 30
ALLEN 3 SALESMAN 30
WARD 3 SALESMAN 30
MARTIN 3 SALESMAN 30
TURNER 3 SALESMAN 30
JAMES 3 CLERK 30
CLARK 2 MANAGER 10
MILLER 3 CLERK 10
14 rows selected.
SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'))
2 /
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------------
SQL_ID d2c7xqxbr112u, child number 0
-------------------------------------
select lpad(' ', 2 * level - 2, ' ') || ename as ename , level
, job , deptno from emp connect by mgr = prior empno
start with mgr is null
Plan hash value: 763482334
-------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads |
-------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 14 |00:00:00.02 | 6 | 6 |
|* 1 | CONNECT BY NO FILTERING WITH START-WITH| | 1 | | 14 |00:00:00.02 | 6 | 6 |
| 2 | TABLE ACCESS FULL | EMP | 1 | 14 | 14 |00:00:00.02 | 6 | 6 |
-------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("MGR"=PRIOR NULL)
filter("MGR" IS NULL)
22 rows selected.
You might wonder what I did to make two exactly the same queries to use a different execution plan, but I'll address that later. First, I'd like to show there are two optimizer hints available, with which you can control which algorithm the optimizer uses:
SQL> select *
2 from v$sql_hint
3 where name like '%CONNECT_BY_FILTERING%'
4 /
NAME SQL_FEATURE CLASS
----------------------- ------------ -----------------------
INVERSE TARGET_LEVEL PROPERTY VERSION VERSION_OUTLINE
----------------------- ------------ ---------- ---------- ---------------
CONNECT_BY_FILTERING QKSFM_ALL CONNECT_BY_FILTERING
NO_CONNECT_BY_FILTERING 2 16 10.2.0.2 10.2.0.2
NO_CONNECT_BY_FILTERING QKSFM_ALL CONNECT_BY_FILTERING
CONNECT_BY_FILTERING 2 16 10.2.0.2 10.2.0.2
2 rows selected.
And this was surprising to me. As the version column suggests, the no_connect_by_filtering hint and the accompanying new algorithm were already introduced in version 10.2.0.2! I checked with my old 10.2.0.4 database and it is indeed present and can be used there:
SQL> select version
2 from v$instance
3 /
VERSION
---------------------------------------------------
10.2.0.4.0
1 row selected.
SQL> select /*+ no_connect_by_filtering gather_plan_statistics */
2 lpad(' ', 2 * level - 2, ' ') || ename as ename
3 , level
4 , job
5 , deptno
6 from emp
7 connect by mgr = prior empno
8 start with mgr is null
9 /
ENAME LEVEL JOB DEPTNO
-------------------- ---------- --------------------------- ----------
KING 1 PRESIDENT 10
JONES 2 MANAGER 20
SCOTT 3 ANALYST 20
ADAMS 4 CLERK 20
FORD 3 ANALYST 20
SMITH 4 CLERK 20
BLAKE 2 MANAGER 30
ALLEN 3 SALESMAN 30
WARD 3 SALESMAN 30
MARTIN 3 SALESMAN 30
TURNER 3 SALESMAN 30
JAMES 3 CLERK 30
CLARK 2 MANAGER 10
MILLER 3 CLERK 10
14 rows selected.
SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'))
2 /
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 39kr5s8dxz7j0, child number 0
-------------------------------------
select /*+ no_connect_by_filtering gather_plan_statistics */ lpad(' ', 2 * level - 2, '
') || ename as ename , level , job , deptno from emp connect by mgr = prior
empno start with mgr is null
Plan hash value: 763482334
----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
----------------------------------------------------------------------------------------------------------
|* 1 | CONNECT BY NO FILTERING WITH START-WITH| | 1 | | 14 |00:00:00.01 | 3 |
| 2 | TABLE ACCESS FULL | EMP | 1 | 14 | 14 |00:00:00.01 | 3 |
----------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("MGR"=PRIOR NULL)
filter("MGR" IS NULL)
21 rows selected.
But you need the no_connect_by_filtering hint in version 10.2.0.4 for this query. If you do not provide the hint, this is the result:
SQL> select /*+ gather_plan_statistics */
2 lpad(' ', 2 * level - 2, ' ') || ename as ename
3 , level
4 , job
5 , deptno
6 from emp
7 connect by mgr = prior empno
8 start with mgr is null
9 /
ENAME LEVEL JOB DEPTNO
-------------------- ---------- --------------------------- ----------
KING 1 PRESIDENT 10
JONES 2 MANAGER 20
SCOTT 3 ANALYST 20
ADAMS 4 CLERK 20
FORD 3 ANALYST 20
SMITH 4 CLERK 20
BLAKE 2 MANAGER 30
ALLEN 3 SALESMAN 30
WARD 3 SALESMAN 30
MARTIN 3 SALESMAN 30
TURNER 3 SALESMAN 30
JAMES 3 CLERK 30
CLARK 2 MANAGER 10
MILLER 3 CLERK 10
14 rows selected.
SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'))
2 /
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 6zhtnf720u0bm, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ lpad(' ', 2 * level - 2, ' ') || ename as ename , level
, job , deptno from emp connect by mgr = prior empno start with mgr is null
Plan hash value: 1869448388
-----------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------
|* 1 | CONNECT BY WITH FILTERING| | 1 | | 14 |00:00:00.01 | 15 | 2048 | 2048 | 2048 (0)|
|* 2 | TABLE ACCESS FULL | EMP | 1 | 1 | 1 |00:00:00.01 | 3 | | | |
|* 3 | HASH JOIN | | 4 | | 13 |00:00:00.01 | 12 | 1452K| 1452K| 843K (0)|
| 4 | CONNECT BY PUMP | | 4 | | 14 |00:00:00.01 | 0 | | | |
| 5 | TABLE ACCESS FULL | EMP | 4 | 2 | 56 |00:00:00.01 | 12 | | | |
-----------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("MGR"=PRIOR NULL)
2 - filter("MGR" IS NULL)
3 - access("MGR"=PRIOR NULL)
24 rows selected.
Which explains why I didn't see the CONNECT BY NO FILTERING WITH START-WITH earlier. It seems that Oracle has adjusted the cost calculation of connect by queries somewhere between 10.2.0.4 and 11.2.0.1. Just look at the cost from both execution plans on 10.2.0.4 using a regular explain plan statement and a "select * from table(dbms_xplan.display):
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 50 | 3 (0)| 00:00:01 |
|* 1 | CONNECT BY WITH FILTERING| | | | | |
|* 2 | TABLE ACCESS FULL | EMP | 1 | 29 | 3 (0)| 00:00:01 |
|* 3 | HASH JOIN | | | | | |
| 4 | CONNECT BY PUMP | | | | | |
| 5 | TABLE ACCESS FULL | EMP | 2 | 50 | 3 (0)| 00:00:01 |
----------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 14 | 350 | 3 (0)| 00:00:01 |
|* 1 | CONNECT BY NO FILTERING WITH START-WITH| | | | | |
| 2 | TABLE ACCESS FULL | EMP | 14 | 350 | 3 (0)| 00:00:01 |
------------------------------------------------------------------------------------------------
The cost of 3 is due to the full table scan of EMP, and no additional cost is added for the hierarchical query.
These are the plans from 11.2.0.2:
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 156 | 15 (20)| 00:00:01 |
|* 1 | CONNECT BY WITH FILTERING| | | | | |
|* 2 | TABLE ACCESS FULL | EMP | 1 | 25 | 4 (0)| 00:00:01 |
|* 3 | HASH JOIN | | 2 | 76 | 9 (12)| 00:00:01 |
| 4 | CONNECT BY PUMP | | | | | |
|* 5 | TABLE ACCESS FULL | EMP | 13 | 325 | 4 (0)| 00:00:01 |
----------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 14 | 728 | 5 (20)| 00:00:01 |
|* 1 | CONNECT BY NO FILTERING WITH START-WITH| | | | | |
| 2 | TABLE ACCESS FULL | EMP | 14 | 350 | 4 (0)| 00:00:01 |
------------------------------------------------------------------------------------------------
The numbers from the 11.2.0.2 show more sophistication than just the cost of the table scan. The optimizer can't know how many levels deep the data is, but version 10.2.0.4 apparently picked 1, and left the total cost unchanged from 3 to 3. I'm curious to know in which version in between 10.2.0.4 and 11.2.0.2 this cost calculation changed. If anyone who is reading this, has a version in between and likes to check, please let me know in the comments. My guess would be that 11.2.0.1 contained the cost change.
What does CONNECT BY NO FILTERING WITH START-WITH do?
Let's explore this, using this table:
SQL> create table t (id, parent_id, value, indicator)
2 as
3 select level - 1
4 , case level when 1 then null else trunc((level-1)/10) end
5 , round(dbms_random.value * 1000)
6 , case mod(level,10) when 4 then 'N' else 'Y' end
7 from dual
8 connect by level <= 100000
9 /
Table created.
SQL> alter table t
2 add constraint cbt_pk
3 primary key (id)
4 /
Table altered.
SQL> create index i1 on t (parent_id,indicator)
2 /
Index created.
SQL> exec dbms_stats.gather_table_stats(user,'t',cascade=>true)
The data is tree shaped where each parent node has exactly 9 child nodes. One tenth of the data, with an id that ends with the digit 3, has its indicator column set to 'N'. This select query will make it clearer how the data looks like:
SQL> select *
2 from t
3 where id < 24 or id > 99997
4 order by id
5 /
ID PARENT_ID VALUE I
---------- ---------- ---------- -
0 656 Y
1 0 289 Y
2 0 365 Y
3 0 644 N
4 0 364 Y
5 0 841 Y
6 0 275 Y
7 0 529 Y
8 0 500 Y
9 0 422 Y
10 1 598 Y
11 1 104 Y
12 1 467 Y
13 1 296 N
14 1 105 Y
15 1 220 Y
16 1 692 Y
17 1 793 Y
18 1 29 Y
19 1 304 Y
20 2 467 Y
21 2 716 Y
22 2 837 Y
23 2 432 N
99998 9999 609 Y
99999 9999 24 Y
26 rows selected.
When hearing the word "filter", I almost immediately associate it with a WHERE clause. But a where clause in a connect by query, is not what is meant by connect by filtering. The documentation states:
Oracle processes hierarchical queries as follows:
A join, if present, is evaluated first, whether the join is specified in the FROM clause or with WHERE clause predicates.
The CONNECT BY condition is evaluated.
Any remaining WHERE clause predicates are evaluated.
So a where clause predicate is evaluated AFTER the connect by has done its job. You can see that happening here:
SQL> explain plan
2 for
3 select id
4 , parent_id
5 , sys_connect_by_path(id,'->') scbp
6 from t
7 where indicator = 'N'
8 connect by parent_id = prior id
9 start with parent_id is null
10 /
Explained.
SQL> select * from table(dbms_xplan.display)
2 /
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2502271019
---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 11 | 319 | 164 (3)| 00:00:02 |
|* 1 | FILTER | | | | | |
|* 2 | CONNECT BY WITH FILTERING | | | | | |
|* 3 | TABLE ACCESS FULL | T | 1 | 11 | 80 (2)| 00:00:01 |
| 4 | NESTED LOOPS | | 10 | 240 | 82 (2)| 00:00:01 |
| 5 | CONNECT BY PUMP | | | | | |
| 6 | TABLE ACCESS BY INDEX ROWID| T | 10 | 110 | 2 (0)| 00:00:01 |
|* 7 | INDEX RANGE SCAN | I1 | 10 | | 1 (0)| 00:00:01 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("INDICATOR"='N')
2 - access("PARENT_ID"=PRIOR "ID")
3 - filter("PARENT_ID" IS NULL)
7 - access("PARENT_ID"="connect$_by$_pump$_002"."prior id ")
22 rows selected.
The "indicator = 'N'" predicate is at step 1, which is executed after the CONNECT BY WITH FILTERING at step 2. Note that although this query is executed in 11.2.0.2, the optimizer has chosen the old CONNECT BY WITH FILTERING.
Connect by filtering is done by using filters in your CONNECT BY clause. Here is an example using the predicate "indicator = 'N'" inside the CONNECT BY clause:
SQL> select id
2 , parent_id
3 , sys_connect_by_path(id,'->') scbp
4 from t
5 connect by parent_id = prior id
6 and indicator = 'N'
7 start with parent_id is null
8 /
ID PARENT_ID SCBP
---------- ---------- --------------------------------------------------
0 ->0
3 0 ->0->3
33 3 ->0->3->33
333 33 ->0->3->33->333
3333 333 ->0->3->33->333->3333
33333 3333 ->0->3->33->333->3333->33333
6 rows selected.
SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'))
2 /
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------------
SQL_ID dzkjzrrzgnvd5, child number 0
-------------------------------------
select id , parent_id , sys_connect_by_path(id,'->') scbp
from t connect by parent_id = prior id and indicator = 'N'
start with parent_id is null
Plan hash value: 3164577763
---------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 6 |00:00:00.01 | 294 | | | |
|* 1 | CONNECT BY WITH FILTERING | | 1 | | 6 |00:00:00.01 | 294 | 2048 | 2048 | 2048 (0)|
|* 2 | TABLE ACCESS FULL | T | 1 | 1 | 1 |00:00:00.01 | 277 | | | |
| 3 | NESTED LOOPS | | 6 | 5 | 5 |00:00:00.01 | 17 | | | |
| 4 | CONNECT BY PUMP | | 6 | | 6 |00:00:00.01 | 0 | | | |
| 5 | TABLE ACCESS BY INDEX ROWID| T | 6 | 5 | 5 |00:00:00.01 | 17 | | | |
|* 6 | INDEX RANGE SCAN | I1 | 6 | 5 | 5 |00:00:00.01 | 12 | | | |
---------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("PARENT_ID"=PRIOR NULL)
2 - filter("PARENT_ID" IS NULL)
6 - access("PARENT_ID"="connect$_by$_pump$_002"."prior id " AND "INDICATOR"='N')
27 rows selected.
In the A-rows column, you can see that the connect by filtering was effective here. Only the necessary rows were being read. And this is the key difference between the two connect by algorithms: with CONNECT BY WITH FILTERING, you can filter within each recursion, whereas CONNECT BY NO FILTERING WITH START-WITH has to read everything, does an in-memory operation, and return the result. With this example, the latter is much less efficient:
SQL> select /*+ no_connect_by_filtering */ id
2 , parent_id
3 , sys_connect_by_path(id,'->') scbp
4 from t
5 connect by parent_id = prior id
6 and indicator = 'N'
7 start with parent_id is null
8 /
ID PARENT_ID SCBP
---------- ---------- --------------------------------------------------
0 ->0
3 0 ->0->3
33 3 ->0->3->33
333 33 ->0->3->33->333
3333 333 ->0->3->33->333->3333
33333 3333 ->0->3->33->333->3333->33333
6 rows selected.
SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'))
2 /
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 3fcr31tp83by9, child number 0
-------------------------------------
select /*+ no_connect_by_filtering */ id , parent_id ,
sys_connect_by_path(id,'->') scbp from t connect by parent_id =
prior id and indicator = 'N' start with parent_id is null
Plan hash value: 2303479083
----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 6 |00:00:00.14 | 277 |
|* 1 | CONNECT BY NO FILTERING WITH START-WITH| | 1 | | 6 |00:00:00.14 | 277 |
| 2 | TABLE ACCESS FULL | T | 1 | 100K| 100K|00:00:00.01 | 277 |
----------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("PARENT_ID"=PRIOR NULL)
filter("PARENT_ID" IS NULL)
22 rows selected.
100K rows were being read, and the A-time was 0.14 seconds instead of 0.01 seconds. I wondered where those 0.14 seconds went to, since the plan shows it's NOT for the full table scan. Using Tom Kyte's runstats_pkg reveals this:
SQL> declare
2 cursor c1
3 is
4 select /*+ connect_by_filtering */ id
5 , parent_id
6 , sys_connect_by_path(id,'->') scbp
7 from t
8 connect by parent_id = prior id
9 and indicator = 'N'
10 start with parent_id is null
11 ;
12 cursor c2
13 is
14 select /*+ no_connect_by_filtering */ id
15 , parent_id
16 , sys_connect_by_path(id,'->') scbp
17 from t
18 connect by parent_id = prior id
19 and indicator = 'N'
20 start with parent_id is null
21 ;
22 begin
23 runstats_pkg.rs_start;
24 for r in c1 loop null; end loop;
25 runstats_pkg.rs_middle;
26 for r in c2 loop null; end loop;
27 runstats_pkg.rs_stop;
28 end;
29 /
Run1 ran in 0 hsecs
Run2 ran in 10 hsecs
run 1 ran in 0% of the time
Name Run1 Run2 Diff
STAT...HSC Heap Segment Block 16 15 -1
STAT...db block changes 48 47 -1
STAT...consistent gets - exami 9 8 -1
STAT...db block gets from cach 32 33 1
STAT...db block gets 32 33 1
STAT...redo subscn max counts 0 1 1
STAT...redo ordering marks 0 1 1
STAT...redo entries 16 15 -1
STAT...calls to kcmgas 0 1 1
STAT...calls to kcmgcs 29 28 -1
STAT...free buffer requested 0 1 1
STAT...Heap Segment Array Inse 16 15 -1
STAT...consistent changes 32 31 -1
STAT...heap block compress 9 8 -1
STAT...parse time cpu 1 0 -1
STAT...buffer is pinned count 1 0 -1
STAT...session cursor cache co 1 0 -1
STAT...sql area evicted 1 0 -1
LATCH.undo global data 11 10 -1
LATCH.SQL memory manager worka 3 5 2
LATCH.messages 0 2 2
LATCH.OS process allocation 0 2 2
LATCH.simulator hash latch 20 23 3
LATCH.object queue header oper 4 1 -3
STAT...workarea executions - o 10 6 -4
STAT...table fetch by rowid 15 10 -5
STAT...index scans kdiixs1 6 0 -6
LATCH.row cache objects 280 274 -6
STAT...sorts (memory) 8 2 -6
STAT...CPU used by this sessio 2 11 9
STAT...Elapsed Time 1 11 10
STAT...recursive cpu usage 2 12 10
STAT...no work - consistent re 300 284 -16
STAT...buffer is not pinned co 36 20 -16
STAT...session logical reads 354 337 -17
STAT...consistent gets from ca 313 296 -17
STAT...consistent gets from ca 322 304 -18
LATCH.shared pool 186 168 -18
STAT...consistent gets 322 304 -18
LATCH.shared pool simulator 23 4 -19
LATCH.cache buffers chains 785 740 -45
STAT...undo change vector size 3,500 3,420 -80
STAT...redo size 4,652 4,560 -92
STAT...session uga memory 0 -65,488 -65,488
STAT...session pga memory 0 -65,536 -65,536
STAT...sorts (rows) 12 100,001 99,989
Run1 latches total versus runs -- difference and pct
Run1 Run2 Diff Pct
1,467 1,384 -83 106.00%
PL/SQL procedure successfully completed
The major difference is the number of rows sorted! The CONNECT BY NO FILTERING WITH START-WITH sorts all 100K rows. This is a surprise, because normally when you sort, you use memory from the PGA workarea, which shows up in your memory statistics from your execution plan. But the no filtering plan did not show those statistics (OMem, 1Mem, Used-Mem). I have no explanation for this phenomenon yet.
Let's zoom in on the sorting:
SQL> select sn.name
2 , ms.value
3 from v$mystat ms
4 , v$statname sn
5 where ms.statistic# = sn.statistic#
6 and sn.name like '%sort%'
7 /
NAME VALUE
----------------------- ----------
sorts (memory) 2278
sorts (disk) 0
sorts (rows) 9425510
3 rows selected.
SQL> select id
2 , parent_id
3 , sys_connect_by_path(id,'->') scbp
4 from t
5 connect by parent_id = prior id
6 and indicator = 'N'
7 start with parent_id is null
8 /
ID PARENT_ID SCBP
---------- ---------- --------------------------------------------------
0 ->0
3 0 ->0->3
33 3 ->0->3->33
333 33 ->0->3->33->333
3333 333 ->0->3->33->333->3333
33333 3333 ->0->3->33->333->3333->33333
6 rows selected.
SQL> select sn.name
2 , ms.value
3 from v$mystat ms
4 , v$statname sn
5 where ms.statistic# = sn.statistic#
6 and sn.name like '%sort%'
7 /
NAME VALUE
----------------------- ----------
sorts (memory) 2286
sorts (disk) 0
sorts (rows) 9425522
3 rows selected.
SQL> select /*+ no_connect_by_filtering */ id
2 , parent_id
3 , sys_connect_by_path(id,'->') scbp
4 from t
5 connect by parent_id = prior id
6 and indicator = 'N'
7 start with parent_id is null
8 /
ID PARENT_ID SCBP
---------- ---------- --------------------------------------------------
0 ->0
3 0 ->0->3
33 3 ->0->3->33
333 33 ->0->3->33->333
3333 333 ->0->3->33->333->3333
33333 3333 ->0->3->33->333->3333->33333
6 rows selected.
SQL> select sn.name
2 , ms.value
3 from v$mystat ms
4 , v$statname sn
5 where ms.statistic# = sn.statistic#
6 and sn.name like '%sort%'
7 /
NAME VALUE
----------------------- ----------
sorts (memory) 2288
sorts (disk) 0
sorts (rows) 9525523
3 rows selected.
So CONNECT BY WITH FILTERING did 8 sorts (2286 - 2278) and sorted 12 rows (9425522 - 9425510), whereas CONNECT BY NO FILTERING WITH START-WITH did 2 (2288 - 2286) sorts and sorted 100,001 rows (9525523 - 9425522).
And finally, I promised to explain why the first two queries of this blogpost are identical, but show a different execution plan. The reason is simple: the first one is executed on 10.2.0.4 and the second one on 11.2.0.2.