Monday, February 19, 2018

Filter predicates with NVL on mandatory columns using the index

Last Friday at work, I overheard a conversation between two DBA's about a performance problem of a job which usually ran in 10 minutes, now taking more than 4 hours. I helped them pinpoint the problem: a simple SQL query which took >95% of the time. Clearly this query was the culprit. The query selects from just one table but with 10 filter predicates of this form:

nvl(column,'@') = nvl(:bind_variable,'@')

And all 10 columns where part of a single unique constraint. The trace file showed the query being executed 1500 times and the unique index belonging to the unique constraint was not being used, resulting in 1500 full table scans. Your index won't be used if you use a function around your column, like shown here. This is what I have learned in the past and what I had experienced many times. The query was part of a function generated by a TAPI generator, so I checked the table definition and to my surprise I saw the columns were all defined as NOT NULL, meaning the NVL constructs were simply unnecessary. So my advice to the developer responsible for this code, was to manually alter the generated code and remove the unnecessary NVL functions, which would make the query be able to use the unique index again.

This morning at the coffee machine, I asked the developer how things were going with the performance problem and he mentioned the problem was solved by rebuilding the index and without altering the code. I could not believe this, because a query with all those NVL predicates would not be able to use the index ever, right? So I built a little test case to see it for my self (or not) and ran it on my 12.1.0.2 database and subsequently ran the script on a very old 9.2.0.5 database and I saw a difference. I then narrowed the versions down to a 10.2.0.3 and 11.2.0.4 database with the same difference.

Here is the output of my testcase on a 10.2.0.3 database:

SQL> create table robtest
  2  as
  3   select level id
  4        , ceil(level/10000)  code1
  5        , ceil(level/1000)   code2
  6        , ceil(level/100)    code3
  7        , mod(level/100,100) code4
  8        , lpad('*',1000,'*') description
  9     from dual
10  connect by level <= 100000
11  /
 
Tabel is aangemaakt.
 
SQL> alter table robtest add constraint robtest_uk unique (code1,code2,code3,code4)
  2  /
 
Tabel is gewijzigd.

SQL> exec dbms_stats.gather_table_stats(user,'robtest',cascade=>true)
 
PL/SQL-procedure is geslaagd.
An analyzed table with 100,000 rows, where the four columns code1, code2, code3 and code4 make up the functional unique key. The four columns don't have a NOT NULL constraint, although they only contain NOT NULL values. In this particular 10.2.0.3 database, setting autotrace on did not fully succeed, but we still have enough information to draw conclusions:
SQL> set autotrace on
SP2-0618: Kan het sessie-identificatiesymbool niet vinden. Controleer of PLUSTRACE-rol actief is.
SP2-0611: Fout bij activeren van STATISTICS-rapport.
SQL> select *
  2    from robtest
  3   where code1 = 1
  4     and code2 = 1
  5     and code3 = 1
  6     and code4 = 1
  7  /
 
        ID      CODE1      CODE2      CODE3      CODE4 DESCRIPTION
---------- ---------- ---------- ---------- ---------- ----------------------------------------
       100          1          1          1          1 ****************************************
 
1 rij is geselecteerd.
 
 
Uitvoeringspan
----------------------------------------------------------
Plan hash value: 1745344684
 
------------------------------------------------------------------------------------------
| Id  | Operation                   | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |            |     1 |  1019 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| ROBTEST    |     1 |  1019 |     2   (0)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN         | ROBTEST_UK |     1 |       |     1   (0)| 00:00:01 |
------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access("CODE1"=1 AND "CODE2"=1 AND "CODE3"=1 AND "CODE4"=1)
Of course, four regular filters lead to a nice access filter and INDEX UNIQUE SCAN, as expected.
SQL> select *
  2    from robtest
  3   where nvl(code1,-1) = 1
  4     and nvl(code2,-1) = 1
  5     and nvl(code3,-1) = 1
  6     and nvl(code4,-1) = 1
  7  /
 
        ID      CODE1      CODE2      CODE3      CODE4 DESCRIPTION
---------- ---------- ---------- ---------- ---------- ----------------------------------------
       100          1          1          1          1 ****************************************
 
1 rij is geselecteerd.
 
 
Uitvoeringspan
----------------------------------------------------------
Plan hash value: 1915841528
 
-----------------------------------------------------------------------------
| Id  | Operation         | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |         |     1 |  1019 |  3172   (1)| 00:00:39 |
|*  1 |  TABLE ACCESS FULL| ROBTEST |     1 |  1019 |  3172   (1)| 00:00:39 |
-----------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter(NVL("CODE4",(-1))=1 AND NVL("CODE3",(-1))=1 AND
              NVL("CODE2",(-1))=1 AND NVL("CODE1",(-1))=1)
And using NVL around the columns disables the use of the index. Even adding NOT NULL constraints on those four columns doesn't change the execution plan:
SQL> alter table robtest modify code1 constraint nn1 not null
  2  /
 
Tabel is gewijzigd.
 
SQL> alter table robtest modify code2 constraint nn2 not null
  2  /
 
Tabel is gewijzigd.
 
SQL> alter table robtest modify code3 constraint nn3 not null
  2  /
 
Tabel is gewijzigd.
 
SQL> alter table robtest modify code4 constraint nn4 not null
  2  /
 
Tabel is gewijzigd.
 
SQL> select *
  2    from robtest
  3   where nvl(code1,-1) = 1
  4     and nvl(code2,-1) = 1
  5     and nvl(code3,-1) = 1
  6     and nvl(code4,-1) = 1
  7  /
 
        ID      CODE1      CODE2      CODE3      CODE4 DESCRIPTION
---------- ---------- ---------- ---------- ---------- ----------------------------------------
       100          1          1          1          1 ****************************************
 
1 rij is geselecteerd.
 
 
Uitvoeringspan
----------------------------------------------------------
Plan hash value: 1915841528
 
-----------------------------------------------------------------------------
| Id  | Operation         | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |         |     1 |  1019 |  3172   (1)| 00:00:39 |
|*  1 |  TABLE ACCESS FULL| ROBTEST |     1 |  1019 |  3172   (1)| 00:00:39 |
-----------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter(NVL("CODE4",(-1))=1 AND NVL("CODE3",(-1))=1 AND
              NVL("CODE2",(-1))=1 AND NVL("CODE1",(-1))=1)
So far, everything as expected. Now watch what happens when executing the same scenario on a 11.2.0.4 database:
SQL> create table robtest
  2  as
  3   select level id
  4        , ceil(level/10000)  code1
  5        , ceil(level/1000)   code2
  6        , ceil(level/100)    code3
  7        , mod(level/100,100) code4
  8        , lpad('*',1000,'*') description
  9     from dual
10  connect by level <= 100000
11  /
 
Tabel is aangemaakt.
 
SQL> alter table robtest add constraint robtest_uk unique (code1,code2,code3,code4)
  2  /
 
Tabel is gewijzigd.
 
SQL> exec dbms_stats.gather_table_stats(user,'robtest',cascade=>true)
 
PL/SQL-procedure is geslaagd.
 
SQL> set autotrace on
SQL> select *
  2    from robtest
  3   where code1 = 1
  4     and code2 = 1
  5     and code3 = 1
  6     and code4 = 1
  7  /
 
        ID      CODE1      CODE2      CODE3      CODE4 DESCRIPTION
---------- ---------- ---------- ---------- ---------- ----------------------------------------
       100          1          1          1          1 ****************************************
 
1 rij is geselecteerd.
 
 
Uitvoeringspan
----------------------------------------------------------
Plan hash value: 1291870759
 
------------------------------------------------------------------------------------------
| Id  | Operation                   | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |            |     1 |  1020 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| ROBTEST    |     1 |  1020 |     2   (0)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN         | ROBTEST_UK |     1 |       |     1   (0)| 00:00:01 |
------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access("CODE1"=1 AND "CODE2"=1 AND "CODE3"=1 AND "CODE4"=1)
 
 
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          3  consistent gets
          0  physical reads
          0  redo size
       1590  bytes sent via SQL*Net to client
        348  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed
 
SQL> select *
  2    from robtest
  3   where nvl(code1,-1) = 1
  4     and nvl(code2,-1) = 1
  5     and nvl(code3,-1) = 1
  6     and nvl(code4,-1) = 1
  7  /
 
        ID      CODE1      CODE2      CODE3      CODE4 DESCRIPTION
---------- ---------- ---------- ---------- ---------- ----------------------------------------
       100          1          1          1          1 ****************************************
 
1 rij is geselecteerd.
 
 
Uitvoeringspan
----------------------------------------------------------
Plan hash value: 83448843
 
-----------------------------------------------------------------------------
| Id  | Operation         | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |         |     1 |  1020 |  3875   (1)| 00:00:47 |
|*  1 |  TABLE ACCESS FULL| ROBTEST |     1 |  1020 |  3875   (1)| 00:00:47 |
-----------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter(NVL("CODE4",(-1))=1 AND NVL("CODE3",(-1))=1 AND
              NVL("CODE2",(-1))=1 AND NVL("CODE1",(-1))=1)
 
 
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
      14297  consistent gets
          0  physical reads
          0  redo size
       1649  bytes sent via SQL*Net to client
        359  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed
 
SQL> alter table robtest modify code1 constraint nn1 not null
  2  /
 
Tabel is gewijzigd.
 
SQL> alter table robtest modify code2 constraint nn2 not null
  2  /
 
Tabel is gewijzigd.
 
SQL> alter table robtest modify code3 constraint nn3 not null
  2  /
 
Tabel is gewijzigd.
 
SQL> alter table robtest modify code4 constraint nn4 not null
  2  /
 
Tabel is gewijzigd.
 
SQL> select *
  2    from robtest
  3   where nvl(code1,-1) = 1
  4     and nvl(code2,-1) = 1
  5     and nvl(code3,-1) = 1
  6     and nvl(code4,-1) = 1
  7  /
 
        ID      CODE1      CODE2      CODE3      CODE4 DESCRIPTION
---------- ---------- ---------- ---------- ---------- ----------------------------------------
       100          1          1          1          1 ****************************************
 
1 rij is geselecteerd.
 
 
Uitvoeringspan
----------------------------------------------------------
Plan hash value: 1291870759
 
------------------------------------------------------------------------------------------
| Id  | Operation                   | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |            |     1 |  1020 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| ROBTEST    |     1 |  1020 |     2   (0)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN         | ROBTEST_UK |     1 |       |     1   (0)| 00:00:01 |
------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access("CODE1"=1 AND "CODE2"=1 AND "CODE3"=1 AND "CODE4"=1)
 
 
Statistics
----------------------------------------------------------
         38  recursive calls
          0  db block gets
         65  consistent gets
          0  physical reads
          0  redo size
       1590  bytes sent via SQL*Net to client
        348  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          6  sorts (memory)
          0  sorts (disk)
          1  rows processed
As you can see in the execution plan, a INDEX UNIQUE SCAN is used. The predicate information section tells us that the optimizer was able to transform the predicates "NVL(column_with_not_null_constraint,-1)" to simply "column_with_not_null_constraint". The extra recursive calls and consistent gets disappear for subsequent executions. So somewhere between versions 10.2.0.3 and 11.2.0.4, the optimizer introduced this nice little optimization.

If you have a database at hand with a version in between the two mentioned versions and you would like to try, here is the script you can copy-paste-and-run:
set echo on
set serveroutput off
column description format a40 truncate
select * from v$version
/
create table robtest
as
select level id
      , ceil(level/10000)  code1
      , ceil(level/1000)   code2
      , ceil(level/100)    code3
      , mod(level/100,100) code4
      , lpad('*',1000,'*') description
   from dual
connect by level <= 100000
/
alter table robtest add constraint robtest_uk unique (code1,code2,code3,code4)
/
exec dbms_stats.gather_table_stats(user,'robtest',cascade=>true)
set autotrace on
select *
  from robtest
where code1 = 1
   and code2 = 1
   and code3 = 1
   and code4 = 1
/
select *
  from robtest
where nvl(code1,-1) = 1
   and nvl(code2,-1) = 1
   and nvl(code3,-1) = 1
   and nvl(code4,-1) = 1
/
alter table robtest modify code1 constraint nn1 not null
/
alter table robtest modify code2 constraint nn2 not null
/
alter table robtest modify code3 constraint nn3 not null
/
alter table robtest modify code4 constraint nn4 not null
/
select *
  from robtest
where nvl(code1,-1) = 1
   and nvl(code2,-1) = 1
   and nvl(code3,-1) = 1
   and nvl(code4,-1) = 1
/
set autotrace off
set echo off
drop table robtest purge
/
Please paste your results in the comments section along with the version used.