About Oracle: UPSERT ALL

A few weeks ago, I received a question from a BI-colleague who had a problem with a query using the model clause in a materialized view. He wanted to generate new sets of data according to some rules, but he got stuck. The query he used was huge: approximately 680 packed lines containing a model with 100+ rules. It used 12 dimension columns and operated on a table containing 90K rows.

The question was very interesting though and I’ll try to simulate it with a smaller table like this:

SQL> create table mytable (code, col1, col2, col3, value)
  2  as
  3  select 'MW001', 1, 1, 1, 11 from dual union all
  4  select 'MW001', 1, 1, 2, 12 from dual union all
  5  select 'MW001', 1, 1, 3, 13 from dual union all
  6  select 'MW001', 1, 1, 4, 14 from dual union all
  7  select 'MW002', 1, 1, 1, 21 from dual union all
  8  select 'MW002', 1, 1, 2, 22 from dual union all
  9  select 'MW002', 1, 1, 3, 23 from dual union all
 10  select 'MW002', 1, 1, 4, 24 from dual union all
 11  select 'MW003', 1, 1, 1, 31 from dual union all
 12  select 'MW003', 1, 1, 2, 32 from dual union all
 13  select 'MW003', 1, 1, 3, 33 from dual union all
 14  select 'MW003', 1, 1, 4, 34 from dual union all
 15  select 'MW004', 1, 1, 1, 41 from dual union all
 16  select 'MW004', 1, 1, 2, 42 from dual union all
 17  select 'MW004', 1, 1, 3, 43 from dual union all
 18  select 'MW004', 1, 1, 4, 44 from dual union all
 19  select 'MW005', 1, 1, 1, 51 from dual union all
 20  select 'MW005', 1, 1, 2, 52 from dual union all
 21  select 'MW005', 1, 1, 3, 53 from dual union all
 22  select 'MW005', 1, 1, 4, 54 from dual union all
 23  select 'MW011', 11, 1, 1, 11 from dual union all
 24  select 'MW011', 11, 1, 2, 12 from dual union all
 25  select 'MW011', 11, 1, 3, 13 from dual union all
 26  select 'MW012', 11, 1, 1, 21 from dual union all
 27  select 'MW012', 11, 1, 2, 22 from dual union all
 28  select 'MW012', 11, 1, 3, 23 from dual union all
 29  select 'MW013', 11, 1, 1, 31 from dual union all
 30  select 'MW013', 11, 1, 2, 32 from dual union all
 31  select 'MW013', 11, 1, 3, 33 from dual
 32  /

Tabel is aangemaakt.

In this table there are 8 sets of data, identified by the column called code. The col1, col2 and col3 values identify the rows within a set. Together with the code column they can be considered the primary key. The purpose is to generate new sets of data, using the same key values col1, col2 and col3, but with a simple formula to calculate the value column. In the example I will use the following rules to simulate the original problem

MW006 == MW001 + MW002
MW007 == MW006 - MW003
MW008 == MW004 + MW005
MW009 == MW007 + MW008
MW014 == MW011 + MW012
MW015 == MW013 + MW014

These rules would have to lead to the following extra rows:

 CODE        COL1       COL2       COL3      VALUE
----- ---------- ---------- ---------- ----------
MW006          1          1          1         32
MW006          1          1          2         34
MW006          1          1          3         36
MW006          1          1          4         38
MW007          1          1          1          1
MW007          1          1          2          2
MW007          1          1          3          3
MW007          1          1          4          4
MW008          1          1          1         92
MW008          1          1          2         94
MW008          1          1          3         96
MW008          1          1          4         98
MW009          1          1          1         93
MW009          1          1          2         96
MW009          1          1          3         99
MW009          1          1          4        102
MW014         11          1          1         32
MW014         11          1          2         34
MW014         11          1          3         36
MW015         11          1          1         63
MW015         11          1          2         66
MW015         11          1          3         69

The value 32 in the required output’s first row is the result from 11 + 21, and 34 = 12 + 22 and so on. Note that the col1, col2 and col3 match exactly within each set.

If you look back at the create table statement, you will notice that are two set types MW001 – MW005, all having four identical key values, and MW011 – MW013 having three identical key values. Of course it is only possible to use sets within one set type in the rules. This is a precondition for this problem and not something we have to validate being true.

My colleague had read the documentation very carefully and used the same construction as recommended in it. Here is a link to that part of the documentation, just scroll down a little bit. He used the construct as described in the part about adding new sales for Poland, which resulted in this query:

SQL> select *
  2    from mytable
  3   model
  4         return updated rows
  5         dimension by (code, col1, col2, col3)
  6         measures (value)
  7         rules
  8         ( value
  9           [ for (code, col1, col2, col3) in
 10             ( select distinct 'MW006', col1, col2, col3
 11                 from mytable
 12                where code = 'MW001'
 13             )
 14           ] =   value['MW001', cv(), cv(), cv()]
 15               + value['MW002', cv(), cv(), cv()]
 16         )
 17   order by code
 18       , col1
 19       , col2
 20       , col3
 21  /

CODE        COL1       COL2       COL3      VALUE
----- ---------- ---------- ---------- ----------
MW006          1          1          1         32
MW006          1          1          2         34
MW006          1          1          3         36
MW006          1          1          4         38

4 rijen zijn geselecteerd.

My colleague got stuck with this problem because he used something similar as below for the rule for MW007:

SQL> select *
  2    from mytable
  3   model
  4         return updated rows
  5         dimension by (code, col1, col2, col3)
  6         measures (value)
  7         rules
  8         ( -- MW006
  9           value
 10           [ for (code, col1, col2, col3) in
 11             ( select distinct 'MW006', col1, col2, col3
 12                 from mytable
 13                where code = 'MW001'
 14             )
 15           ] =   value['MW001', cv(), cv(), cv()]
 16               + value['MW002', cv(), cv(), cv()]
 17         , -- MW007
 18           value
 19           [ for (code, col1, col2, col3) in
 20             ( select distinct 'MW007', col1, col2, col3
 21                 from mytable
 22                where code = 'MW006'
 23             )
 24           ] =   value['MW006', cv(), cv(), cv()]
 25               - value['MW003', cv(), cv(), cv()]
 26         )
 27   order by code
 28       , col1
 29       , col2
 30       , col3
 31  /

CODE        COL1       COL2       COL3      VALUE
----- ---------- ---------- ---------- ----------
MW006          1          1          1         32
MW006          1          1          2         34
MW006          1          1          3         36
MW006          1          1          4         38

4 rijen zijn geselecteerd.

And the four rows of set MW007 are not generated. It does not work because the model clause does not insert rows into the table: it only generates a result set. So selecting rows “where code = ‘MW006’” will never retrieve a row. So this line of thought is clearly wrong. To make this query work, it is absolutely necessary to know which base set should be used to generate the rows. If we change the query for the second rule to use the predicate “code = ‘MW001’” instead, the query works:

SQL> select *
  2    from mytable
  3   model
  4         return updated rows
  5         dimension by (code, col1, col2, col3)
  6         measures (value)
  7         rules
  8         ( -- MW006
  9           value
 10           [ for (code, col1, col2, col3) in
 11             ( select distinct 'MW006', col1, col2, col3
 12                 from mytable
 13                where code = 'MW001'
 14             )
 15           ] =   value['MW001', cv(), cv(), cv()]
 16               + value['MW002', cv(), cv(), cv()]
 17         , -- MW007
 18           value
 19           [ for (code, col1, col2, col3) in
 20             ( select distinct 'MW007', col1, col2, col3
 21                 from mytable
 22                where code = 'MW001'
 23             )
 24           ] =   value['MW006', cv(), cv(), cv()]
 25               - value['MW003', cv(), cv(), cv()]
 26         )
 27   order by code
 28       , col1
 29       , col2
 30       , col3
 31  /

CODE        COL1       COL2       COL3      VALUE
----- ---------- ---------- ---------- ----------
MW006          1          1          1         32
MW006          1          1          2         34
MW006          1          1          3         36
MW006          1          1          4         38
MW007          1          1          1          1
MW007          1          1          2          2
MW007          1          1          3          3
MW007          1          1          4          4

8 rijen zijn geselecteerd.

But why does Oracle advice to use a separate query to generate a new set? It is a new query that has to be executed, and in the original case there would be 100+ of them. Now if it was the only way to achieve the result …, but it’s not: it’s what the UPSERT ALL clause is all about! In my experience the UPSERT ALL clause is rarely used, but this is a showcase. However, you have to know exactly how it works, or you might get surprised by this behaviour:

SQL> select *
  2    from mytable
  3   model
  4         return updated rows
  5         dimension by (code, col1, col2, col3)
  6         measures (value)
  7         rules upsert all
  8         ( value ['MW006', col1, col2, col3]
  9           =   value['MW001', cv(), cv(), cv()]
 10             + value['MW002', cv(), cv(), cv()]
 11         )
 12   order by code
 13       , col1
 14       , col2
 15       , col3
 16  /

CODE        COL1       COL2       COL3      VALUE
----- ---------- ---------- ---------- ----------
MW006          1          1          1         32
MW006          1          1          2         34
MW006          1          1          3         36
MW006          1          1          4         38
MW006         11          1          1
MW006         11          1          2
MW006         11          1          3

7 rijen zijn geselecteerd.

Here 7 rows are selected instead of 4, because – as documented – all unique combinations that satisfy the non positional references, being col1, col2 and col3, are identified. Since I want only four I have to introduce an extra dimension value code2 based on code:

SQL> select *
  2    from mytable
  3   model
  4         return updated rows
  5         dimension by (code, code code2, col1, col2, col3)
  6         measures (value)
  7         rules upsert all
  8         ( value ['MW006', code2 = 'MW001', col1, col2, col3]
  9           =   value['MW001', 'MW001', cv(), cv(), cv()]
 10             + value['MW002', 'MW002', cv(), cv(), cv()]
 11         )
 12   order by code
 13       , col1
 14       , col2
 15       , col3
 16  /

CODE  CODE2       COL1       COL2       COL3      VALUE
----- ----- ---------- ---------- ---------- ----------
MW006 MW001          1          1          1         32
MW006 MW001          1          1          2         34
MW006 MW001          1          1          3         36
MW006 MW001          1          1          4         38

4 rijen zijn geselecteerd.

For this solution it is also –just as the solution using a query per rule- absolutely necessary to know against which base set the rows should be generated. This is a tedious task in a situation with 100+ rules, but there is no way around that. At least a lot of inner queries are avoided using UPSERT ALL.

I ended up suggesting a third alternative. However, looking back on it, I am not at all happy with this advice because the third alternative doesn’t relieve the fact of having to know the base set. I guess I was too disappointed about not getting to the point of not having to know the base set, that I turned down the UPSERT ALL variant altogether.
The third alternative was this:

SQL> select *
  2    from mytable
  3   model
  4         return updated rows
  5         reference r on (select code, count(*) countcode from mytable group by code)
  6           dimension by (code coderef)
  7           measures (countcode)
  8         main m
  9           dimension by (code, row_number() over (partition by code order by col1, col2, col3) rn)
 10           measures (col1, col2, col3, value)
 11           rules
 12           ( col1['MW006',for rn from 1 to countcode['MW001'] increment 1] = col1['MW001',cv()]
 13           , col2['MW006',any] = col2['MW001',cv()]
 14           , col3['MW006',any] = col3['MW001',cv()]
 15           , value['MW006',any] = value['MW001',cv()] + value['MW002',cv()]
 16           , col1['MW007',for rn from 1 to countcode['MW001'] increment 1] = col1['MW001',cv()]
 17           , col2['MW007',any] = col2['MW001',cv()]
 18           , col3['MW007',any] = col3['MW001',cv()]
 19           , value['MW007',any] = value['MW006',cv()] - value['MW003',cv()]
 20           , col1['MW008',for rn from 1 to countcode['MW004'] increment 1] = col1['MW001',cv()]
 21           , col2['MW008',any] = col2['MW001',cv()]
 22           , col3['MW008',any] = col3['MW001',cv()]
 23           , value['MW008',any] = value['MW004',cv()] + value['MW005',cv()]
 24           , col1['MW009',for rn from 1 to countcode['MW001'] increment 1] = col1['MW001',cv()]
 25           , col2['MW009',any] = col2['MW001',cv()]
 26           , col3['MW009',any] = col3['MW001',cv()]
 27           , value['MW009',any] = value['MW007',cv()] + value['MW008',cv()]
 28           , col1['MW014',for rn from 1 to countcode['MW011'] increment 1] = col1['MW011',cv()]
 29           , col2['MW014',any] = col2['MW011',cv()]
 30           , col3['MW014',any] = col3['MW011',cv()]
 31           , value['MW014',any] = value['MW011',cv()] + value['MW012',cv()]
 32           , col1['MW015',for rn from 1 to countcode['MW011'] increment 1] = col1['MW011',cv()]
 33           , col2['MW015',any] = col2['MW011',cv()]
 34           , col3['MW015',any] = col3['MW011',cv()]
 35           , value['MW015',any] = value['MW013',cv()] + value['MW014',cv()]
 36           )
 37   order by code
 38       , col1
 39       , col2
 40       , col3
 41  /

CODE          RN       COL1       COL2       COL3      VALUE
----- ---------- ---------- ---------- ---------- ----------
MW006          1          1          1          1         32
MW006          2          1          1          2         34
MW006          3          1          1          3         36
MW006          4          1          1          4         38
MW007          1          1          1          1          1
MW007          2          1          1          2          2
MW007          3          1          1          3          3
MW007          4          1          1          4          4
MW008          1          1          1          1         92
MW008          2          1          1          2         94
MW008          3          1          1          3         96
MW008          4          1          1          4         98
MW009          1          1          1          1         93
MW009          2          1          1          2         96
MW009          3          1          1          3         99
MW009          4          1          1          4        102
MW014          1         11          1          1         32
MW014          2         11          1          2         34
MW014          3         11          1          3         36
MW015          1         11          1          1         63
MW015          2         11          1          2         66
MW015          3         11          1          3         69

22 rijen zijn geselecteerd.

The key point here is to use the FOR loop to generate the rows. To do so I need to index all entries by means of the ROW_NUMBER analytic function, and use a reference model to know how many rows to generate. It works, but it is more complex compared to the UPSERT ALL alternative and there are a lot more rules this way, because of the increase in measure values.

Luckily they turned down the third alternative, but they could have used the UPSERT ALL perfectly. The advice came too late however, so they headed towards a totally different approach. Very likely something which does not involve using materialized views with a model clause query.

The complete UPSERT ALL solution looks like this:

SQL> select *
  2    from mytable
  3   model
  4         return updated rows
  5         dimension by (code, code code2, col1, col2, col3)
  6         measures (value)
  7         rules upsert all
  8         ( value ['MW006', code2 = 'MW001', col1, col2, col3]
  9           =   value['MW001', 'MW001', cv(), cv(), cv()]
 10             + value['MW002', 'MW002', cv(), cv(), cv()]
 11         , value ['MW007', code2 = 'MW001', col1, col2, col3]
 12           =   value['MW006', 'MW001', cv(), cv(), cv()]
 13             - value['MW003', 'MW003', cv(), cv(), cv()]
 14         , value ['MW008', code2 = 'MW001', col1, col2, col3]
 15           =   value['MW004', 'MW004', cv(), cv(), cv()]
 16             + value['MW005', 'MW005', cv(), cv(), cv()]
 17         , value ['MW009', code2 = 'MW001', col1, col2, col3]
 18           =   value['MW007', 'MW001', cv(), cv(), cv()]
 19             + value['MW008', 'MW001', cv(), cv(), cv()]
 20         , value ['MW014', code2 = 'MW011', col1, col2, col3]
 21           =   value['MW011', 'MW011', cv(), cv(), cv()]
 22             + value['MW012', 'MW012', cv(), cv(), cv()]
 23         , value ['MW015', code2 = 'MW011', col1, col2, col3]
 24           =   value['MW013', 'MW013', cv(), cv(), cv()]
 25             + value['MW014', 'MW011', cv(), cv(), cv()]
 26         )
 27   order by code
 28       , col1
 29       , col2
 30       , col3
 31  /

CODE  CODE2       COL1       COL2       COL3      VALUE
----- ----- ---------- ---------- ---------- ----------
MW006 MW001          1          1          1         32
MW006 MW001          1          1          2         34
MW006 MW001          1          1          3         36
MW006 MW001          1          1          4         38
MW007 MW001          1          1          1          1
MW007 MW001          1          1          2          2
MW007 MW001          1          1          3          3
MW007 MW001          1          1          4          4
MW008 MW001          1          1          1         92
MW008 MW001          1          1          2         94
MW008 MW001          1          1          3         96
MW008 MW001          1          1          4         98
MW009 MW001          1          1          1         93
MW009 MW001          1          1          2         96
MW009 MW001          1          1          3         99
MW009 MW001          1          1          4        102
MW014 MW011         11          1          1         32
MW014 MW011         11          1          2         34
MW014 MW011         11          1          3         36
MW015 MW011         11          1          1         63
MW015 MW011         11          1          2         66
MW015 MW011         11          1          3         69

22 rijen zijn geselecteerd.

Or maybe for sake of clarity you could separate the generating of new rows based on original base sets from the calculation of the new values like this:

SQL> select *
  2    from ( select code, col1, col2, col3, value
  3             from mytable
  4            model
  5                  dimension by (code, code code2, col1, col2, col3)
  6                  measures (value)
  7                  rules upsert all
  8                  ( value['MW006', code2 = 'MW001', col1, col2, col3] = 0
  9                  , value['MW007', code2 = 'MW001', col1, col2, col3] = 0
 10                  , value['MW008', code2 = 'MW001', col1, col2, col3] = 0
 11                  , value['MW009', code2 = 'MW001', col1, col2, col3] = 0
 12                  , value['MW014', code2 = 'MW011', col1, col2, col3] = 0
 13                  , value['MW015', code2 = 'MW011', col1, col2, col3] = 0
 14                  )
 15         )
 16   model
 17         return updated rows
 18         dimension by (code, col1, col2, col3)
 19         measures (value)
 20         rules automatic order
 21         ( value['MW006', col1, col2, col3]
 22           =   value['MW001', cv(), cv(), cv()]
 23             + value['MW002', cv(), cv(), cv()]
 24         , value['MW008', col1, col2, col3]
 25           =   value['MW004', cv(), cv(), cv()]
 26             + value['MW005', cv(), cv(), cv()]
 27         , value['MW015', col1, col2, col3]
 28           =   value['MW013', cv(), cv(), cv()]
 29             + value['MW014', cv(), cv(), cv()]
 30         , value['MW009', col1, col2, col3]
 31           =   value['MW007', cv(), cv(), cv()]
 32             + value['MW008', cv(), cv(), cv()]
 33         , value['MW014', col1, col2, col3]
 34           =   value['MW011', cv(), cv(), cv()]
 35             + value['MW012', cv(), cv(), cv()]
 36         , value['MW007', col1, col2, col3]
 37           =   value['MW006', cv(), cv(), cv()]
 38             - value['MW003', cv(), cv(), cv()]
 39         )
 40   order by code
 41       , col1
 42       , col2
 43       , col3
 44  /

CODE        COL1       COL2       COL3      VALUE
----- ---------- ---------- ---------- ----------
MW006          1          1          1         32
MW006          1          1          2         34
MW006          1          1          3         36
MW006          1          1          4         38
MW007          1          1          1          1
MW007          1          1          2          2
MW007          1          1          3          3
MW007          1          1          4          4
MW008          1          1          1         92
MW008          1          1          2         94
MW008          1          1          3         96
MW008          1          1          4         98
MW009          1          1          1         93
MW009          1          1          2         96
MW009          1          1          3         99
MW009          1          1          4        102
MW014         11          1          1         32
MW014         11          1          2         34
MW014         11          1          3         36
MW015         11          1          1         63
MW015         11          1          2         66
MW015         11          1          3         69

22 rijen zijn geselecteerd.

UPDATE

Today I found out he did use the UPSERT ALL and that my phrase "it is also <...> absolutely necessary to know against which base set the rows should be generated." is worded too strong.

He used this query:

SQL> select *
  2    from ( select *
  3             from mytable
  4            model
  5                  return updated rows
  6                  dimension by (code, col1, col2, col3)
  7                  measures (value)
  8                  rules upsert all
  9                  ( value ['MW006', col1, col2, col3]
 10                    = case
 11                      when value['MW001', cv(), cv(), cv()] is not null
 12                        or value['MW002', cv(), cv(), cv()] is not null
 13                      then
 14                          nvl(value['MW001', cv(), cv(), cv()],0)
 15                        + nvl(value['MW002', cv(), cv(), cv()],0)
 16                      end
 17                  , value ['MW007', col1, col2, col3]
 18                    = case
 19                      when value['MW006', cv(), cv(), cv()] is not null
 20                        or value['MW003', cv(), cv(), cv()] is not null
 21                      then
 22                          nvl(value['MW006', cv(), cv(), cv()],0)
 23                        - nvl(value['MW003', cv(), cv(), cv()],0)
 24                      end
 25                  , value ['MW008', col1, col2, col3]
 26                    = case
 27                      when value['MW004', cv(), cv(), cv()] is not null
 28                        or value['MW005', cv(), cv(), cv()] is not null
 29                      then
 30                          nvl(value['MW004', cv(), cv(), cv()],0)
 31                        + nvl(value['MW005', cv(), cv(), cv()],0)
 32                      end
 33                  , value ['MW009', col1, col2, col3]
 34                    = case
 35                      when value['MW007', cv(), cv(), cv()] is not null
 36                        or value['MW008', cv(), cv(), cv()] is not null
 37                      then
 38                          nvl(value['MW007', cv(), cv(), cv()],0)
 39                        + nvl(value['MW008', cv(), cv(), cv()],0)
 40                      end
 41                  , value ['MW014', col1, col2, col3]
 42                    = case
 43                      when value['MW011', cv(), cv(), cv()] is not null
 44                        or value['MW012', cv(), cv(), cv()] is not null
 45                      then
 46                          nvl(value['MW011', cv(), cv(), cv()],0)
 47                        + nvl(value['MW012', cv(), cv(), cv()],0)
 48                      end
 49                  , value ['MW015', col1, col2, col3]
 50                    = case
 51                      when value['MW013', cv(), cv(), cv()] is not null
 52                        or value['MW014', cv(), cv(), cv()] is not null
 53                      then
 54                          nvl(value['MW013', cv(), cv(), cv()],0)
 55                        + nvl(value['MW014', cv(), cv(), cv()],0)
 56                      end
 57                 )
 58         )
 59   where value is not null
 60   order by code
 61       , col1
 62       , col2
 63       , col3
 64  /

CODE        COL1       COL2       COL3      VALUE
----- ---------- ---------- ---------- ----------
MW006          1          1          1         32
MW006          1          1          2         34
MW006          1          1          3         36
MW006          1          1          4         38
MW007          1          1          1          1
MW007          1          1          2          2
MW007          1          1          3          3
MW007          1          1          4          4
MW008          1          1          1         92
MW008          1          1          2         94
MW008          1          1          3         96
MW008          1          1          4         98
MW009          1          1          1         93
MW009          1          1          2         96
MW009          1          1          3         99
MW009          1          1          4        102
MW014         11          1          1         32
MW014         11          1          2         34
MW014         11          1          3         36
MW015         11          1          1         63
MW015         11          1          2         66
MW015         11          1          3         69

22 rijen zijn geselecteerd.

The big advantages are that it is now not necessary anymore to refer to an original set and compared to the solution using a query per rule, it is a lot faster.

However, it is slower than using the extra dimension and referring to the base set. A lot of unnecessary rows are generated that are filtered out at the end by the predicate "value is not null". It feels a lot like forgetting a join condition, resulting in having lots of duplicate rows, and filtering them out by using the DISTINCT keyword.

Very ugly, but it has its advantages.

UPDATE TWO

Volder has left a comment showing an alternative query. By putting the col1, col2 and col3 columns as a partition, the rules become very easy. And as can be seen in the motto of this blog, I really like simple solutions. His solution is this:

SQL> select *
  2    from ( select code
  3                , col1
  4                , col2
  5                , col3
  6                , value
  7             from mytable
  8            model
  9                  return updated rows
 10                  partition by (col1, col2, col3)
 11                  dimension by (code)
 12                  measures(value)
 13                  rules
 14                  ( value['MW006'] = value['MW001'] + value['MW002']
 15                  , value['MW007'] = value['MW006'] - value['MW003']
 16                  , value['MW008'] = value['MW004'] + value['MW005']
 17                  , value['MW009'] = value['MW007'] + value['MW008']
 18                  , value['MW014'] = value['MW011'] + value['MW012']
 19                  , value['MW015'] = value['MW013'] + value['MW014']
 20                  )
 21         )
 22   where value is not null
 23   order by code
 24       , col1
 25       , col2
 26       , col3
 27  /

CODE        COL1       COL2       COL3      VALUE
----- ---------- ---------- ---------- ----------
MW006          1          1          1         32
MW006          1          1          2         34
MW006          1          1          3         36
MW006          1          1          4         38
MW007          1          1          1          1
MW007          1          1          2          2
MW007          1          1          3          3
MW007          1          1          4          4
MW008          1          1          1         92
MW008          1          1          2         94
MW008          1          1          3         96
MW008          1          1          4         98
MW009          1          1          1         93
MW009          1          1          2         96
MW009          1          1          3         99
MW009          1          1          4        102
MW014         11          1          1         32
MW014         11          1          2         34
MW014         11          1          3         36
MW015         11          1          1         63
MW015         11          1          2         66
MW015         11          1          3         69

22 rijen zijn geselecteerd.

The query creates seven partitions for the seven unique combinations of col1, col2 and col3. And for each of those partitions six new cells are calculated, resulting in 42 rows. By inlining the query and filtering the ones with value NULL, the query returns the 22 requested new rows.

So in essence it is the same the query that is currently being used, and which I described in the first update section. That query also generated too many rows - also 42 - that had to be filtered out in the outer where. However, you have to agree that Volder's query looks much much easier.

Personally I still like the query using UPSERT ALL and an extra dimension "code2" the most, because only the necessary rows are generated. The new one also feels like forgetting a join condition and using distinct. But Volder's query sure is superior on all aspects to the query currently being used.

4 comments:

VolderNovember 21, 2007 at 9:59 AM
Hi, Rob.
I didn't compare the perfomance, but it would look much simpler if you used:

SQL> select * from
2 (select code,col1,col2,col3,value from mytable
3 model
4 return updated rows
5 partition by (col1, col2, col3)
6 dimension by (code)
7 measures(value)
8 (value['MW006'] = value['MW001']+value['MW002'],
9 value['MW007'] = value['MW006']-value['MW003'],
10 value['MW008'] = value['MW004']+value['MW005'],
11 value['MW009'] = value['MW007']+value['MW008'],
12 value['MW014'] = value['MW011']+value['MW012'],
13 value['MW015'] = value['MW013']+value['MW014']
14 )
15 )
16 where value is not null
17 order by 1,2,3,4
18 /

CODE COL1 COL2 COL3 VALUE
----- ---------- ---------- ---------- ----------
MW006 1 1 1 32
MW006 1 1 2 34
MW006 1 1 3 36
MW006 1 1 4 38
MW007 1 1 1 1
MW007 1 1 2 2
MW007 1 1 3 3
MW007 1 1 4 4
MW008 1 1 1 92
MW008 1 1 2 94
MW008 1 1 3 96
MW008 1 1 4 98
MW009 1 1 1 93
MW009 1 1 2 96
MW009 1 1 3 99
MW009 1 1 4 102
MW014 11 1 1 32
MW014 11 1 2 34
MW014 11 1 3 36
MW015 11 1 1 63
MW015 11 1 2 66
MW015 11 1 3 69

22 rows selected

SQL>
Rob van WijkNovember 21, 2007 at 3:21 PM
Hi Volder,

Thanks very much for your comment.
I've added a second update section to the blog entry because of it.

Regards,
Rob.
DirkJune 16, 2008 at 1:30 PM
Hello Rob, Volder,

A rather late response, still, as they say, better late than never. Thank you both for helping me with this issue. Rob, you described the solution I used some time ago in the ‘Update’ section. Since then, in the words of Bob Dylan, ‘Things have changed’. The current and presumably final version of the model clause query takes advantage of the partitioning mentioned by Volder. I refrained from using Volder’s solution initially because we have a number of rules that require aggregation. Such a rule sums the values from a group of dimensions (the grouping could change depending on the aggregation). To give an example (based on your code in the update section):

9 ( value ['MW006', ‘predefined value col1’, ‘predefined value col2’, col3]
10 = sum(value)['MW001', any, any, cv()]
11 , value ['MW007', col1, col2, col3]
12 = case
13 when value['MW006', ‘predefined value col1’, ‘predefined value col2’, cv()] is not null
14 or value['MW003', cv(), cv(), cv()] is not null
15 then
16 nvl(value['MW006', ‘predefined value col1’, ‘predefined value col1’, cv()],0)
17 * nvl(value['MW003', cv(), cv(), cv()],0)
18 end

Gradually however performance issues forced me to introduce a greater degree of parallelisation. As you know, by moving columns from the ‘dimension by’ clause to the ‘partition by’ clause boundary points are introduced that help execute the query in parallel. A small test showed that partitioning by all columns except ‘code’ and ‘value’ offered tremendous performance gains (with much more elegant looking code).

The drawback is that a rule with an aggregation, like the one above, no longer returns the correct outcome, for the rule cannot cross the boundary points defined by the ‘partition by’ clause. In other words, the ‘any’ statements used when summing refer to the rows within the set defined by the boundary points.

To solve this, the code was split by removing the aggregations from the model clause query to another query. The latter query takes care of the aggregations in advance and stores them in a table which is then referenced by the model clause. The model clause was rewritten to allow for greater parallelisation and to take into account the references to the table containing the aggregations. If similar action was undertaken for the example the code above would end up looking like this:

9 value ['MW007]
10 = case
11 when aggregated_value['MW006', cv(col3)] is not null
12 or value['MW003', cv(), cv(), cv()] is not null
13 then
14 nvl(aggregated_value['MW006', cv(col3)],0)
15 * nvl(value['MW003', cv(), cv(), cv()],0)
16 end

The ‘MW006’ rule is absent from the code since it would be calculated in advance. The ‘aggregated_value’ measure is a referenced value which has ‘code’ and ‘col3’ as ‘dimension by’ values (the predefined values used earlier were only used to fill up the dimensions not relevant to an aggregation’s grouping).

The disadvantages of this approach is that unnecessary rows are generated and the code (the business logic) is split and held in different places. The advantages are better performance (at least in my case) and code which is more understandable and much shorter.

Thanks again,

Dirk
Rob van WijkJune 27, 2008 at 10:55 AM
Dirk,

Thanks for letting us know.

I don't know exactly how the current situation looks like, but storing aggregated values in a separate table sounds suspicious. Did you take a look at reference models?

Regards,
Rob.

About Oracle

Pages

Monday, November 12, 2007

UPSERT ALL

UPDATE

UPDATE TWO

4 comments:

About me

My APEX applications

Upcoming presentations

Twitter

Oracle related links

Oracle database and APEX blogs

Blog Archive

Total Pageviews