About Oracle: Connect By Filtering

A hierarchical query is typically executed using a plan that starts with the operation CONNECT BY WITH FILTERING, which has two child operations. The first child operation implements the START WITH clause and the second child operation contains a step called CONNECT BY PUMP, implementing the recursive part of the query. Here is an example of such a plan using the well known hierarchical query on table EMP:

SQL>  select lpad(' ', 2 * level - 2, ' ') || ename as ename
  2        , level
  3        , job
  4        , deptno
  5     from emp
  6  connect by mgr = prior empno
  7    start with mgr is null
  8  /

ENAME                     LEVEL JOB                             DEPTNO
-------------------- ---------- --------------------------- ----------
KING                          1 PRESIDENT                           10
  JONES                       2 MANAGER                             20
    SCOTT                     3 ANALYST                             20
      ADAMS                   4 CLERK                               20
    FORD                      3 ANALYST                             20
      SMITH                   4 CLERK                               20
  BLAKE                       2 MANAGER                             30
    ALLEN                     3 SALESMAN                            30
    WARD                      3 SALESMAN                            30
    MARTIN                    3 SALESMAN                            30
    TURNER                    3 SALESMAN                            30
    JAMES                     3 CLERK                               30
  CLARK                       2 MANAGER                             10
    MILLER                    3 CLERK                               10

14 rows selected.

SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'))
  2  /

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  d2c7xqxbr112u, child number 0
-------------------------------------
 select lpad(' ', 2 * level - 2, ' ') || ename as ename       , level       , job       , deptno    from emp connect by
mgr = prior empno   start with mgr is null

Plan hash value: 1869448388

--------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                 | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------
|*  1 |  CONNECT BY WITH FILTERING|      |      1 |        |     14 |00:00:00.02 |      15 |      6 |  2048 |  2048 | 2048  (0)|
|*  2 |   TABLE ACCESS FULL       | EMP  |      1 |      1 |      1 |00:00:00.01 |       3 |      6 |       |       |          |
|*  3 |   HASH JOIN               |      |      4 |        |     13 |00:00:00.01 |      12 |      0 |  1452K|  1452K|  853K (0)|
|   4 |    CONNECT BY PUMP        |      |      4 |        |     14 |00:00:00.01 |       0 |      0 |       |       |          |
|   5 |    TABLE ACCESS FULL      | EMP  |      4 |      2 |     56 |00:00:00.01 |      12 |      0 |       |       |          |
--------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("MGR"=PRIOR NULL)
   2 - filter("MGR" IS NULL)
   3 - access("MGR"=PRIOR NULL)


24 rows selected.

You can see a great and more detailed explanation of connect by with filtering here on Christian Antognini's blog.

When I was researching the new recursive subquery factoring clause one and a half year ago, and compared a standard hierarchical query on EMP using recursive subquery factoring with a query using the good old connect by, I stumbled upon a new optimizer algorithm for implementing recursive queries:

SQL>  select lpad(' ', 2 * level - 2, ' ') || ename as ename
  2        , level
  3        , job
  4        , deptno
  5     from emp
  6  connect by mgr = prior empno
  7    start with mgr is null
  8  /

ENAME                     LEVEL JOB           DEPTNO
-------------------- ---------- --------- ----------
KING                          1 PRESIDENT         10
  JONES                       2 MANAGER           20
    SCOTT                     3 ANALYST           20
      ADAMS                   4 CLERK             20
    FORD                      3 ANALYST           20
      SMITH                   4 CLERK             20
  BLAKE                       2 MANAGER           30
    ALLEN                     3 SALESMAN          30
    WARD                      3 SALESMAN          30
    MARTIN                    3 SALESMAN          30
    TURNER                    3 SALESMAN          30
    JAMES                     3 CLERK             30
  CLARK                       2 MANAGER           10
    MILLER                    3 CLERK             10

14 rows selected.

SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'))
  2  /

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  d2c7xqxbr112u, child number 0
-------------------------------------
 select lpad(' ', 2 * level - 2, ' ') || ename as ename       , level
    , job       , deptno    from emp connect by mgr = prior empno
start with mgr is null

Plan hash value: 763482334

-------------------------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
-------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |      |      1 |        |     14 |00:00:00.02 |       6 |      6 |
|*  1 |  CONNECT BY NO FILTERING WITH START-WITH|      |      1 |        |     14 |00:00:00.02 |       6 |      6 |
|   2 |   TABLE ACCESS FULL                     | EMP  |      1 |     14 |     14 |00:00:00.02 |       6 |      6 |
-------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("MGR"=PRIOR NULL)
       filter("MGR" IS NULL)


22 rows selected.

You might wonder what I did to make two exactly the same queries to use a different execution plan, but I'll address that later. First, I'd like to show there are two optimizer hints available, with which you can control which algorithm the optimizer uses:

SQL> select *
  2    from v$sql_hint
  3   where name like '%CONNECT_BY_FILTERING%'
  4  /

NAME                    SQL_FEATURE  CLASS
----------------------- ------------ -----------------------
INVERSE                 TARGET_LEVEL   PROPERTY VERSION    VERSION_OUTLINE
----------------------- ------------ ---------- ---------- ---------------
CONNECT_BY_FILTERING    QKSFM_ALL    CONNECT_BY_FILTERING
NO_CONNECT_BY_FILTERING            2         16 10.2.0.2   10.2.0.2

NO_CONNECT_BY_FILTERING QKSFM_ALL    CONNECT_BY_FILTERING
CONNECT_BY_FILTERING               2         16 10.2.0.2   10.2.0.2


2 rows selected.

And this was surprising to me. As the version column suggests, the no_connect_by_filtering hint and the accompanying new algorithm were already introduced in version 10.2.0.2! I checked with my old 10.2.0.4 database and it is indeed present and can be used there:

SQL> select version
  2    from v$instance
  3  /

VERSION
---------------------------------------------------
10.2.0.4.0

1 row selected.

SQL>  select /*+ no_connect_by_filtering gather_plan_statistics */
  2          lpad(' ', 2 * level - 2, ' ') || ename as ename
  3        , level
  4        , job
  5        , deptno
  6     from emp
  7  connect by mgr = prior empno
  8    start with mgr is null
  9  /

ENAME                     LEVEL JOB                             DEPTNO
-------------------- ---------- --------------------------- ----------
KING                          1 PRESIDENT                           10
  JONES                       2 MANAGER                             20
    SCOTT                     3 ANALYST                             20
      ADAMS                   4 CLERK                               20
    FORD                      3 ANALYST                             20
      SMITH                   4 CLERK                               20
  BLAKE                       2 MANAGER                             30
    ALLEN                     3 SALESMAN                            30
    WARD                      3 SALESMAN                            30
    MARTIN                    3 SALESMAN                            30
    TURNER                    3 SALESMAN                            30
    JAMES                     3 CLERK                               30
  CLARK                       2 MANAGER                             10
    MILLER                    3 CLERK                               10

14 rows selected.

SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'))
  2  /

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  39kr5s8dxz7j0, child number 0
-------------------------------------
 select /*+ no_connect_by_filtering gather_plan_statistics */         lpad(' ', 2 * level - 2, '
') || ename as ename       , level       , job       , deptno    from emp connect by mgr = prior
empno   start with mgr is null

Plan hash value: 763482334

----------------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------------------
|*  1 |  CONNECT BY NO FILTERING WITH START-WITH|      |      1 |        |     14 |00:00:00.01 |       3 |
|   2 |   TABLE ACCESS FULL                     | EMP  |      1 |     14 |     14 |00:00:00.01 |       3 |
----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("MGR"=PRIOR NULL)
       filter("MGR" IS NULL)


21 rows selected.

But you need the no_connect_by_filtering hint in version 10.2.0.4 for this query. If you do not provide the hint, this is the result:

SQL>  select /*+ gather_plan_statistics */
  2          lpad(' ', 2 * level - 2, ' ') || ename as ename
  3        , level
  4        , job
  5        , deptno
  6     from emp
  7  connect by mgr = prior empno
  8    start with mgr is null
  9  /

ENAME                     LEVEL JOB                             DEPTNO
-------------------- ---------- --------------------------- ----------
KING                          1 PRESIDENT                           10
  JONES                       2 MANAGER                             20
    SCOTT                     3 ANALYST                             20
      ADAMS                   4 CLERK                               20
    FORD                      3 ANALYST                             20
      SMITH                   4 CLERK                               20
  BLAKE                       2 MANAGER                             30
    ALLEN                     3 SALESMAN                            30
    WARD                      3 SALESMAN                            30
    MARTIN                    3 SALESMAN                            30
    TURNER                    3 SALESMAN                            30
    JAMES                     3 CLERK                               30
  CLARK                       2 MANAGER                             10
    MILLER                    3 CLERK                               10

14 rows selected.

SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'))
  2  /

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  6zhtnf720u0bm, child number 0
-------------------------------------
 select /*+ gather_plan_statistics */         lpad(' ', 2 * level - 2, ' ') || ename as ename       , level
   , job       , deptno    from emp connect by mgr = prior empno   start with mgr is null

Plan hash value: 1869448388

-----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                 | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------
|*  1 |  CONNECT BY WITH FILTERING|      |      1 |        |     14 |00:00:00.01 |      15 |  2048 |  2048 | 2048  (0)|
|*  2 |   TABLE ACCESS FULL       | EMP  |      1 |      1 |      1 |00:00:00.01 |       3 |       |       |          |
|*  3 |   HASH JOIN               |      |      4 |        |     13 |00:00:00.01 |      12 |  1452K|  1452K|  843K (0)|
|   4 |    CONNECT BY PUMP        |      |      4 |        |     14 |00:00:00.01 |       0 |       |       |          |
|   5 |    TABLE ACCESS FULL      | EMP  |      4 |      2 |     56 |00:00:00.01 |      12 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("MGR"=PRIOR NULL)
   2 - filter("MGR" IS NULL)
   3 - access("MGR"=PRIOR NULL)


24 rows selected.

Which explains why I didn't see the CONNECT BY NO FILTERING WITH START-WITH earlier. It seems that Oracle has adjusted the cost calculation of connect by queries somewhere between 10.2.0.4 and 11.2.0.1. Just look at the cost from both execution plans on 10.2.0.4 using a regular explain plan statement and a "select * from table(dbms_xplan.display):

----------------------------------------------------------------------------------
| Id  | Operation                 | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |      |     2 |    50 |     3   (0)| 00:00:01 |
|*  1 |  CONNECT BY WITH FILTERING|      |       |       |            |          |
|*  2 |   TABLE ACCESS FULL       | EMP  |     1 |    29 |     3   (0)| 00:00:01 |
|*  3 |   HASH JOIN               |      |       |       |            |          |
|   4 |    CONNECT BY PUMP        |      |       |       |            |          |
|   5 |    TABLE ACCESS FULL      | EMP  |     2 |    50 |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------------

------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |      |    14 |   350 |     3   (0)| 00:00:01 |
|*  1 |  CONNECT BY NO FILTERING WITH START-WITH|      |       |       |            |          |
|   2 |   TABLE ACCESS FULL                     | EMP  |    14 |   350 |     3   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------

The cost of 3 is due to the full table scan of EMP, and no additional cost is added for the hierarchical query.

These are the plans from 11.2.0.2:

----------------------------------------------------------------------------------
| Id  | Operation                 | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |      |     3 |   156 |    15  (20)| 00:00:01 |
|*  1 |  CONNECT BY WITH FILTERING|      |       |       |            |          |
|*  2 |   TABLE ACCESS FULL       | EMP  |     1 |    25 |     4   (0)| 00:00:01 |
|*  3 |   HASH JOIN               |      |     2 |    76 |     9  (12)| 00:00:01 |
|   4 |    CONNECT BY PUMP        |      |       |       |            |          |
|*  5 |    TABLE ACCESS FULL      | EMP  |    13 |   325 |     4   (0)| 00:00:01 |
----------------------------------------------------------------------------------

------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |      |    14 |   728 |     5  (20)| 00:00:01 |
|*  1 |  CONNECT BY NO FILTERING WITH START-WITH|      |       |       |            |          |
|   2 |   TABLE ACCESS FULL                     | EMP  |    14 |   350 |     4   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------

The numbers from the 11.2.0.2 show more sophistication than just the cost of the table scan. The optimizer can't know how many levels deep the data is, but version 10.2.0.4 apparently picked 1, and left the total cost unchanged from 3 to 3. I'm curious to know in which version in between 10.2.0.4 and 11.2.0.2 this cost calculation changed. If anyone who is reading this, has a version in between and likes to check, please let me know in the comments. My guess would be that 11.2.0.1 contained the cost change.

What does CONNECT BY NO FILTERING WITH START-WITH do?

Let's explore this, using this table:

SQL> create table t (id, parent_id, value, indicator)
  2  as
  3   select level - 1
  4        , case level when 1 then null else trunc((level-1)/10) end
  5        , round(dbms_random.value * 1000)
  6        , case mod(level,10) when 4 then 'N' else 'Y' end
  7     from dual
  8  connect by level <= 100000
  9  /

Table created.

SQL> alter table t
  2    add constraint cbt_pk
  3    primary key (id)
  4  /

Table altered.

SQL> create index i1 on t (parent_id,indicator)
  2  /

Index created.

SQL> exec dbms_stats.gather_table_stats(user,'t',cascade=>true)

The data is tree shaped where each parent node has exactly 9 child nodes. One tenth of the data, with an id that ends with the digit 3, has its indicator column set to 'N'. This select query will make it clearer how the data looks like:

SQL> select *
  2    from t
  3   where id < 24 or id > 99997
  4   order by id
  5  /

        ID  PARENT_ID      VALUE I
---------- ---------- ---------- -
         0                   656 Y
         1          0        289 Y
         2          0        365 Y
         3          0        644 N
         4          0        364 Y
         5          0        841 Y
         6          0        275 Y
         7          0        529 Y
         8          0        500 Y
         9          0        422 Y
        10          1        598 Y
        11          1        104 Y
        12          1        467 Y
        13          1        296 N
        14          1        105 Y
        15          1        220 Y
        16          1        692 Y
        17          1        793 Y
        18          1         29 Y
        19          1        304 Y
        20          2        467 Y
        21          2        716 Y
        22          2        837 Y
        23          2        432 N
     99998       9999        609 Y
     99999       9999         24 Y

26 rows selected.

When hearing the word "filter", I almost immediately associate it with a WHERE clause. But a where clause in a connect by query, is not what is meant by connect by filtering. The documentation states:

Oracle processes hierarchical queries as follows:

A join, if present, is evaluated first, whether the join is specified in the FROM clause or with WHERE clause predicates.

The CONNECT BY condition is evaluated.

Any remaining WHERE clause predicates are evaluated.

So a where clause predicate is evaluated AFTER the connect by has done its job. You can see that happening here:

SQL> explain plan
  2  for
  3   select id
  4        , parent_id
  5        , sys_connect_by_path(id,'->') scbp
  6     from t
  7    where indicator = 'N'
  8  connect by parent_id = prior id
  9    start with parent_id is null
 10  /

Explained.

SQL> select * from table(dbms_xplan.display)
  2  /

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2502271019

---------------------------------------------------------------------------------------
| Id  | Operation                      | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |      |    11 |   319 |   164   (3)| 00:00:02 |
|*  1 |  FILTER                        |      |       |       |            |          |
|*  2 |   CONNECT BY WITH FILTERING    |      |       |       |            |          |
|*  3 |    TABLE ACCESS FULL           | T    |     1 |    11 |    80   (2)| 00:00:01 |
|   4 |    NESTED LOOPS                |      |    10 |   240 |    82   (2)| 00:00:01 |
|   5 |     CONNECT BY PUMP            |      |       |       |            |          |
|   6 |     TABLE ACCESS BY INDEX ROWID| T    |    10 |   110 |     2   (0)| 00:00:01 |
|*  7 |      INDEX RANGE SCAN          | I1   |    10 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("INDICATOR"='N')
   2 - access("PARENT_ID"=PRIOR "ID")
   3 - filter("PARENT_ID" IS NULL)
   7 - access("PARENT_ID"="connect$_by$_pump$_002"."prior id   ")

22 rows selected.

The "indicator = 'N'" predicate is at step 1, which is executed after the CONNECT BY WITH FILTERING at step 2. Note that although this query is executed in 11.2.0.2, the optimizer has chosen the old CONNECT BY WITH FILTERING.

Connect by filtering is done by using filters in your CONNECT BY clause. Here is an example using the predicate "indicator = 'N'" inside the CONNECT BY clause:

SQL>  select id
  2        , parent_id
  3        , sys_connect_by_path(id,'->') scbp
  4     from t
  5  connect by parent_id = prior id
  6      and indicator = 'N'
  7    start with parent_id is null
  8  /

        ID  PARENT_ID SCBP
---------- ---------- --------------------------------------------------
         0            ->0
         3          0 ->0->3
        33          3 ->0->3->33
       333         33 ->0->3->33->333
      3333        333 ->0->3->33->333->3333
     33333       3333 ->0->3->33->333->3333->33333

6 rows selected.

SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'))
  2  /

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  dzkjzrrzgnvd5, child number 0
-------------------------------------
 select id       , parent_id       , sys_connect_by_path(id,'->') scbp
  from t connect by parent_id = prior id     and indicator = 'N'
start with parent_id is null

Plan hash value: 3164577763

---------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |      |      1 |        |      6 |00:00:00.01 |     294 |       |       |          |
|*  1 |  CONNECT BY WITH FILTERING    |      |      1 |        |      6 |00:00:00.01 |     294 |  2048 |  2048 | 2048  (0)|
|*  2 |   TABLE ACCESS FULL           | T    |      1 |      1 |      1 |00:00:00.01 |     277 |       |       |          |
|   3 |   NESTED LOOPS                |      |      6 |      5 |      5 |00:00:00.01 |      17 |       |       |          |
|   4 |    CONNECT BY PUMP            |      |      6 |        |      6 |00:00:00.01 |       0 |       |       |          |
|   5 |    TABLE ACCESS BY INDEX ROWID| T    |      6 |      5 |      5 |00:00:00.01 |      17 |       |       |          |
|*  6 |     INDEX RANGE SCAN          | I1   |      6 |      5 |      5 |00:00:00.01 |      12 |       |       |          |
---------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("PARENT_ID"=PRIOR NULL)
   2 - filter("PARENT_ID" IS NULL)
   6 - access("PARENT_ID"="connect$_by$_pump$_002"."prior id     " AND "INDICATOR"='N')


27 rows selected.

In the A-rows column, you can see that the connect by filtering was effective here. Only the necessary rows were being read. And this is the key difference between the two connect by algorithms: with CONNECT BY WITH FILTERING, you can filter within each recursion, whereas CONNECT BY NO FILTERING WITH START-WITH has to read everything, does an in-memory operation, and return the result. With this example, the latter is much less efficient:

SQL>  select /*+ no_connect_by_filtering */ id
  2        , parent_id
  3        , sys_connect_by_path(id,'->') scbp
  4     from t
  5  connect by parent_id = prior id
  6      and indicator = 'N'
  7    start with parent_id is null
  8  /

        ID  PARENT_ID SCBP
---------- ---------- --------------------------------------------------
         0            ->0
         3          0 ->0->3
        33          3 ->0->3->33
       333         33 ->0->3->33->333
      3333        333 ->0->3->33->333->3333
     33333       3333 ->0->3->33->333->3333->33333

6 rows selected.

SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'))
  2  /

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  3fcr31tp83by9, child number 0
-------------------------------------
 select /*+ no_connect_by_filtering */ id       , parent_id       ,
sys_connect_by_path(id,'->') scbp    from t connect by parent_id =
prior id     and indicator = 'N'   start with parent_id is null

Plan hash value: 2303479083

----------------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |      |      1 |        |      6 |00:00:00.14 |     277 |
|*  1 |  CONNECT BY NO FILTERING WITH START-WITH|      |      1 |        |      6 |00:00:00.14 |     277 |
|   2 |   TABLE ACCESS FULL                     | T    |      1 |    100K|    100K|00:00:00.01 |     277 |
----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("PARENT_ID"=PRIOR NULL)
       filter("PARENT_ID" IS NULL)


22 rows selected.

100K rows were being read, and the A-time was 0.14 seconds instead of 0.01 seconds. I wondered where those 0.14 seconds went to, since the plan shows it's NOT for the full table scan. Using Tom Kyte's runstats_pkg reveals this:

SQL> declare
  2    cursor c1
  3    is
  4     select /*+ connect_by_filtering */ id
  5          , parent_id
  6          , sys_connect_by_path(id,'->') scbp
  7       from t
  8    connect by parent_id = prior id
  9        and indicator = 'N'
 10      start with parent_id is null
 11    ;
 12    cursor c2
 13    is
 14     select /*+ no_connect_by_filtering */ id
 15          , parent_id
 16          , sys_connect_by_path(id,'->') scbp
 17       from t
 18    connect by parent_id = prior id
 19        and indicator = 'N'
 20      start with parent_id is null
 21    ;
 22  begin
 23    runstats_pkg.rs_start;
 24    for r in c1 loop null; end loop;
 25    runstats_pkg.rs_middle;
 26    for r in c2 loop null; end loop;
 27    runstats_pkg.rs_stop;
 28  end;
 29  /
Run1 ran in 0 hsecs
Run2 ran in 10 hsecs
run 1 ran in 0% of the time
 
Name                                  Run1        Run2        Diff
STAT...HSC Heap Segment Block           16          15          -1
STAT...db block changes                 48          47          -1
STAT...consistent gets - exami           9           8          -1
STAT...db block gets from cach          32          33           1
STAT...db block gets                    32          33           1
STAT...redo subscn max counts            0           1           1
STAT...redo ordering marks               0           1           1
STAT...redo entries                     16          15          -1
STAT...calls to kcmgas                   0           1           1
STAT...calls to kcmgcs                  29          28          -1
STAT...free buffer requested             0           1           1
STAT...Heap Segment Array Inse          16          15          -1
STAT...consistent changes               32          31          -1
STAT...heap block compress               9           8          -1
STAT...parse time cpu                    1           0          -1
STAT...buffer is pinned count            1           0          -1
STAT...session cursor cache co           1           0          -1
STAT...sql area evicted                  1           0          -1
LATCH.undo global data                  11          10          -1
LATCH.SQL memory manager worka           3           5           2
LATCH.messages                           0           2           2
LATCH.OS process allocation              0           2           2
LATCH.simulator hash latch              20          23           3
LATCH.object queue header oper           4           1          -3
STAT...workarea executions - o          10           6          -4
STAT...table fetch by rowid             15          10          -5
STAT...index scans kdiixs1               6           0          -6
LATCH.row cache objects                280         274          -6
STAT...sorts (memory)                    8           2          -6
STAT...CPU used by this sessio           2          11           9
STAT...Elapsed Time                      1          11          10
STAT...recursive cpu usage               2          12          10
STAT...no work - consistent re         300         284         -16
STAT...buffer is not pinned co          36          20         -16
STAT...session logical reads           354         337         -17
STAT...consistent gets from ca         313         296         -17
STAT...consistent gets from ca         322         304         -18
LATCH.shared pool                      186         168         -18
STAT...consistent gets                 322         304         -18
LATCH.shared pool simulator             23           4         -19
LATCH.cache buffers chains             785         740         -45
STAT...undo change vector size       3,500       3,420         -80
STAT...redo size                     4,652       4,560         -92
STAT...session uga memory                0     -65,488     -65,488
STAT...session pga memory                0     -65,536     -65,536
STAT...sorts (rows)                     12     100,001      99,989
 
Run1 latches total versus runs -- difference and pct
Run1        Run2        Diff       Pct
1,467       1,384         -83    106.00%

PL/SQL procedure successfully completed

The major difference is the number of rows sorted! The CONNECT BY NO FILTERING WITH START-WITH sorts all 100K rows. This is a surprise, because normally when you sort, you use memory from the PGA workarea, which shows up in your memory statistics from your execution plan. But the no filtering plan did not show those statistics (OMem, 1Mem, Used-Mem). I have no explanation for this phenomenon yet.

Let's zoom in on the sorting:

SQL> select sn.name
  2       , ms.value
  3    from v$mystat ms
  4       , v$statname sn
  5   where ms.statistic# = sn.statistic#
  6     and sn.name like '%sort%'
  7  /

NAME                         VALUE
----------------------- ----------
sorts (memory)                2278
sorts (disk)                     0
sorts (rows)               9425510

3 rows selected.

SQL>  select id
  2        , parent_id
  3        , sys_connect_by_path(id,'->') scbp
  4     from t
  5  connect by parent_id = prior id
  6      and indicator = 'N'
  7    start with parent_id is null
  8  /

        ID  PARENT_ID SCBP
---------- ---------- --------------------------------------------------
         0            ->0
         3          0 ->0->3
        33          3 ->0->3->33
       333         33 ->0->3->33->333
      3333        333 ->0->3->33->333->3333
     33333       3333 ->0->3->33->333->3333->33333

6 rows selected.

SQL> select sn.name
  2       , ms.value
  3    from v$mystat ms
  4       , v$statname sn
  5   where ms.statistic# = sn.statistic#
  6     and sn.name like '%sort%'
  7  /

NAME                         VALUE
----------------------- ----------
sorts (memory)                2286
sorts (disk)                     0
sorts (rows)               9425522

3 rows selected.

SQL>  select /*+ no_connect_by_filtering */ id
  2        , parent_id
  3        , sys_connect_by_path(id,'->') scbp
  4     from t
  5  connect by parent_id = prior id
  6      and indicator = 'N'
  7    start with parent_id is null
  8  /

        ID  PARENT_ID SCBP
---------- ---------- --------------------------------------------------
         0            ->0
         3          0 ->0->3
        33          3 ->0->3->33
       333         33 ->0->3->33->333
      3333        333 ->0->3->33->333->3333
     33333       3333 ->0->3->33->333->3333->33333

6 rows selected.

SQL> select sn.name
  2       , ms.value
  3    from v$mystat ms
  4       , v$statname sn
  5   where ms.statistic# = sn.statistic#
  6     and sn.name like '%sort%'
  7  /

NAME                         VALUE
----------------------- ----------
sorts (memory)                2288
sorts (disk)                     0
sorts (rows)               9525523

3 rows selected.

So CONNECT BY WITH FILTERING did 8 sorts (2286 - 2278) and sorted 12 rows (9425522 - 9425510), whereas CONNECT BY NO FILTERING WITH START-WITH did 2 (2288 - 2286) sorts and sorted 100,001 rows (9525523 - 9425522).

And finally, I promised to explain why the first two queries of this blogpost are identical, but show a different execution plan. The reason is simple: the first one is executed on 10.2.0.4 and the second one on 11.2.0.2.

6 comments:

DomBrooksMarch 26, 2012 at 8:43 PM
Rob,

Interesting article.

Event 10032 confirms that DBMS_XPLAN is being economical with the truth about “sorting”.

As you mentioned, the runstats stats show that the NO_CONNECT_BY_FILTERING did 2 in-memory sorts (compared to the 8 for the other plan). This is backed up by the 10032 trace which, in my case, indicated that one of these 2 sorts in memory sorted 100000 rows (doing lots of comparisons) in 6885376 bytes.

So, a lot more rows sorted and significantly higher memory usage.

So at the very least it looks like we can say that DBMS_XPLAN does not accurately report memory usage for CONNECT_BY_NO_FILTERING operations?

Cheers,
Dominic
Rob van WijkMarch 27, 2012 at 9:25 PM
Thanks for your comment, Dominic.

I did a 10032 trace for the first time ever. It's a great way to see how individual sort operations performed when your query performs more than one sort at the same time. I saw a similar number of bytes: 7724032. And approximately 1,5M comparisons performed.

I reached the same conclusion as you. And it's way too early for me to generalize further, but I'm more sceptic now about the MEMSTATS columns of dbms_xplan.
Vidhya ArunDecember 14, 2012 at 5:36 PM
Rob,

Most excellent!

Just one thing though - in my case, for a similar query [without adding the CONNECT_BY_NO_FILTERING hint] my Oracle Database 11.2.0.3.0 optimizer randomly chooses one plan, then the other plan - there seems to be no pattern. I will investigate - but your post is very helpful. Thanks.

Regards,
Arun
RajjyJuly 16, 2013 at 5:03 PM
Hi ,

Can some one please explain me how to comprehend/understand multiple conditions in connect by prior conditions with some example data.

I am creating a table like this

CREATE TABLE FAMiLY_TREE

(

GRAND_FATHERID number,

FATHER_ID number,

SON_ID number,

person_name varchar(20)

);

INSERT INTO FAMILY_TREE (GRAND_FATHERID ,FATHER_ID , SON_ID , PERSON_NAME ) VALUES

(NULL, NULL , 5 , 'Mr X ' );

INSERT INTO FAMILY_TREE (GRAND_FATHERID ,FATHER_ID , SON_ID , PERSON_NAME ) VALUES

(null, 5 , 6 , 'Dave' );

INSERT INTO FAMILY_TREE (GRAND_FATHERID ,FATHER_ID , SON_ID , PERSON_NAME ) VALUES

(5, 6 , 7 , 'Vinny' );

INSERT INTO FAMILY_TREE (GRAND_FATHERID ,FATHER_ID , SON_ID , PERSON_NAME ) VALUES

(5, 6 , 14 , 'Omy' );

INSERT INTO FAMILY_TREE (GRAND_FATHERID ,FATHER_ID , SON_ID , PERSON_NAME ) VALUES

(5, 6 , 15 , 'Vijjy' );

INSERT INTO FAMILY_TREE (GRAND_FATHERID ,FATHER_ID , SON_ID , PERSON_NAME ) VALUES

(6, 7 , 8 , 'Vicky' );

INSERT INTO FAMILY_TREE (GRAND_FATHERID ,FATHER_ID , SON_ID , PERSON_NAME ) VALUES

(6, 7 , 9 , 'Varis' );

INSERT INTO FAMILY_TREE (GRAND_FATHERID ,FATHER_ID , SON_ID , PERSON_NAME ) VALUES

(7, 8 , 10 , 'Vshnu' );

INSERT INTO FAMILY_TREE (GRAND_FATHERID ,FATHER_ID , SON_ID , PERSON_NAME ) VALUES

(7, 8 , 11 , 'dyna' );

INSERT INTO FAMILY_TREE (GRAND_FATHERID ,FATHER_ID , SON_ID , PERSON_NAME ) VALUES

(8, 10 , 14 , 'Marry' );

INSERT INTO FAMILY_TREE (GRAND_FATHERID ,FATHER_ID , SON_ID , PERSON_NAME ) VALUES

(8, 10 , 15 , 'Mac' );

INSERT INTO FAMILY_TREE (GRAND_FATHERID ,FATHER_ID , SON_ID , PERSON_NAME ) VALUES

(7, 9 , 12 , 'Garry' );

INSERT INTO FAMILY_TREE (GRAND_FATHERID ,FATHER_ID , SON_ID , PERSON_NAME ) VALUES

(7, 9 , 13 , 'Ganny' );

SELECT

LPAD(' ', LEVEL*3) || PERSON_NAME FROM FAMILY_TREE

START WITH SON_ID= 6

CONNECT BY PRIOR SON_ID = FATHER_ID

AND PRIOR FATHER_ID = GRAND_FATHERID ;

SELECT

LPAD(' ', LEVEL*3) || PERSON_NAME FROM FAMILY_TREE

START WITH SON_ID= 6

CONNECT BY PRIOR SON_ID = FATHER_ID ;

Both These query return the same o/p

Dave

Vinny

Vicky

Vshnu

Marry

Mac

dyna

Varis

Garry

Ganny

Omy

Vijjy

Can some one please explain me comprehension of both these query or give me a example where i can understand multiple connect by prior conditions

Thanks
Rob van WijkJuly 20, 2013 at 10:46 AM
Rajji,

This is a blog. Please post your questions on a forum like OTN or Stack Overflow.

Thanks,
Rob.
mikJanuary 21, 2014 at 5:10 PM
The "connect by no filtering" sorts 100000 rows, because the filter is on connect by, and not global. Try this:

select /*+ no_connect_by_filtering */ id
, parent_id
, sys_connect_by_path(id,'->') scbp
from (select * from t where indicator = 'N' or parent_id is null)
connect by parent_id = prior id
start with parent_id is null

About Oracle

Pages

Sunday, March 25, 2012

Connect By Filtering

6 comments: