Pages

Monday, November 12, 2007

UPSERT ALL

A few weeks ago, I received a question from a BI-colleague who had a problem with a query using the model clause in a materialized view. He wanted to generate new sets of data according to some rules, but he got stuck. The query he used was huge: approximately 680 packed lines containing a model with 100+ rules. It used 12 dimension columns and operated on a table containing 90K rows.

The question was very interesting though and I’ll try to simulate it with a smaller table like this:

SQL> create table mytable (code, col1, col2, col3, value)
2 as
3 select 'MW001', 1, 1, 1, 11 from dual union all
4 select 'MW001', 1, 1, 2, 12 from dual union all
5 select 'MW001', 1, 1, 3, 13 from dual union all
6 select 'MW001', 1, 1, 4, 14 from dual union all
7 select 'MW002', 1, 1, 1, 21 from dual union all
8 select 'MW002', 1, 1, 2, 22 from dual union all
9 select 'MW002', 1, 1, 3, 23 from dual union all
10 select 'MW002', 1, 1, 4, 24 from dual union all
11 select 'MW003', 1, 1, 1, 31 from dual union all
12 select 'MW003', 1, 1, 2, 32 from dual union all
13 select 'MW003', 1, 1, 3, 33 from dual union all
14 select 'MW003', 1, 1, 4, 34 from dual union all
15 select 'MW004', 1, 1, 1, 41 from dual union all
16 select 'MW004', 1, 1, 2, 42 from dual union all
17 select 'MW004', 1, 1, 3, 43 from dual union all
18 select 'MW004', 1, 1, 4, 44 from dual union all
19 select 'MW005', 1, 1, 1, 51 from dual union all
20 select 'MW005', 1, 1, 2, 52 from dual union all
21 select 'MW005', 1, 1, 3, 53 from dual union all
22 select 'MW005', 1, 1, 4, 54 from dual union all
23 select 'MW011', 11, 1, 1, 11 from dual union all
24 select 'MW011', 11, 1, 2, 12 from dual union all
25 select 'MW011', 11, 1, 3, 13 from dual union all
26 select 'MW012', 11, 1, 1, 21 from dual union all
27 select 'MW012', 11, 1, 2, 22 from dual union all
28 select 'MW012', 11, 1, 3, 23 from dual union all
29 select 'MW013', 11, 1, 1, 31 from dual union all
30 select 'MW013', 11, 1, 2, 32 from dual union all
31 select 'MW013', 11, 1, 3, 33 from dual
32 /

Tabel is aangemaakt.


In this table there are 8 sets of data, identified by the column called code. The col1, col2 and col3 values identify the rows within a set. Together with the code column they can be considered the primary key. The purpose is to generate new sets of data, using the same key values col1, col2 and col3, but with a simple formula to calculate the value column. In the example I will use the following rules to simulate the original problem

MW006 == MW001 + MW002
MW007 == MW006 - MW003
MW008 == MW004 + MW005
MW009 == MW007 + MW008
MW014 == MW011 + MW012
MW015 == MW013 + MW014

These rules would have to lead to the following extra rows:

 CODE        COL1       COL2       COL3      VALUE
----- ---------- ---------- ---------- ----------
MW006 1 1 1 32
MW006 1 1 2 34
MW006 1 1 3 36
MW006 1 1 4 38
MW007 1 1 1 1
MW007 1 1 2 2
MW007 1 1 3 3
MW007 1 1 4 4
MW008 1 1 1 92
MW008 1 1 2 94
MW008 1 1 3 96
MW008 1 1 4 98
MW009 1 1 1 93
MW009 1 1 2 96
MW009 1 1 3 99
MW009 1 1 4 102
MW014 11 1 1 32
MW014 11 1 2 34
MW014 11 1 3 36
MW015 11 1 1 63
MW015 11 1 2 66
MW015 11 1 3 69


The value 32 in the required output’s first row is the result from 11 + 21, and 34 = 12 + 22 and so on. Note that the col1, col2 and col3 match exactly within each set.

If you look back at the create table statement, you will notice that are two set types MW001 – MW005, all having four identical key values, and MW011 – MW013 having three identical key values. Of course it is only possible to use sets within one set type in the rules. This is a precondition for this problem and not something we have to validate being true.

My colleague had read the documentation very carefully and used the same construction as recommended in it. Here is a link to that part of the documentation, just scroll down a little bit. He used the construct as described in the part about adding new sales for Poland, which resulted in this query:

SQL> select *
2 from mytable
3 model
4 return updated rows
5 dimension by (code, col1, col2, col3)
6 measures (value)
7 rules
8 ( value
9 [ for (code, col1, col2, col3) in
10 ( select distinct 'MW006', col1, col2, col3
11 from mytable
12 where code = 'MW001'
13 )
14 ] = value['MW001', cv(), cv(), cv()]
15 + value['MW002', cv(), cv(), cv()]
16 )
17 order by code
18 , col1
19 , col2
20 , col3
21 /

CODE COL1 COL2 COL3 VALUE
----- ---------- ---------- ---------- ----------
MW006 1 1 1 32
MW006 1 1 2 34
MW006 1 1 3 36
MW006 1 1 4 38

4 rijen zijn geselecteerd.


My colleague got stuck with this problem because he used something similar as below for the rule for MW007:

SQL> select *
2 from mytable
3 model
4 return updated rows
5 dimension by (code, col1, col2, col3)
6 measures (value)
7 rules
8 ( -- MW006
9 value
10 [ for (code, col1, col2, col3) in
11 ( select distinct 'MW006', col1, col2, col3
12 from mytable
13 where code = 'MW001'
14 )
15 ] = value['MW001', cv(), cv(), cv()]
16 + value['MW002', cv(), cv(), cv()]
17 , -- MW007
18 value
19 [ for (code, col1, col2, col3) in
20 ( select distinct 'MW007', col1, col2, col3
21 from mytable
22 where code = 'MW006'
23 )
24 ] = value['MW006', cv(), cv(), cv()]
25 - value['MW003', cv(), cv(), cv()]
26 )
27 order by code
28 , col1
29 , col2
30 , col3
31 /

CODE COL1 COL2 COL3 VALUE
----- ---------- ---------- ---------- ----------
MW006 1 1 1 32
MW006 1 1 2 34
MW006 1 1 3 36
MW006 1 1 4 38

4 rijen zijn geselecteerd.


And the four rows of set MW007 are not generated. It does not work because the model clause does not insert rows into the table: it only generates a result set. So selecting rows “where code = ‘MW006’” will never retrieve a row. So this line of thought is clearly wrong. To make this query work, it is absolutely necessary to know which base set should be used to generate the rows. If we change the query for the second rule to use the predicate “code = ‘MW001’” instead, the query works:

SQL> select *
2 from mytable
3 model
4 return updated rows
5 dimension by (code, col1, col2, col3)
6 measures (value)
7 rules
8 ( -- MW006
9 value
10 [ for (code, col1, col2, col3) in
11 ( select distinct 'MW006', col1, col2, col3
12 from mytable
13 where code = 'MW001'
14 )
15 ] = value['MW001', cv(), cv(), cv()]
16 + value['MW002', cv(), cv(), cv()]
17 , -- MW007
18 value
19 [ for (code, col1, col2, col3) in
20 ( select distinct 'MW007', col1, col2, col3
21 from mytable
22 where code = 'MW001'
23 )
24 ] = value['MW006', cv(), cv(), cv()]
25 - value['MW003', cv(), cv(), cv()]
26 )
27 order by code
28 , col1
29 , col2
30 , col3
31 /

CODE COL1 COL2 COL3 VALUE
----- ---------- ---------- ---------- ----------
MW006 1 1 1 32
MW006 1 1 2 34
MW006 1 1 3 36
MW006 1 1 4 38
MW007 1 1 1 1
MW007 1 1 2 2
MW007 1 1 3 3
MW007 1 1 4 4

8 rijen zijn geselecteerd.


But why does Oracle advice to use a separate query to generate a new set? It is a new query that has to be executed, and in the original case there would be 100+ of them. Now if it was the only way to achieve the result …, but it’s not: it’s what the UPSERT ALL clause is all about! In my experience the UPSERT ALL clause is rarely used, but this is a showcase. However, you have to know exactly how it works, or you might get surprised by this behaviour:

SQL> select *
2 from mytable
3 model
4 return updated rows
5 dimension by (code, col1, col2, col3)
6 measures (value)
7 rules upsert all
8 ( value ['MW006', col1, col2, col3]
9 = value['MW001', cv(), cv(), cv()]
10 + value['MW002', cv(), cv(), cv()]
11 )
12 order by code
13 , col1
14 , col2
15 , col3
16 /

CODE COL1 COL2 COL3 VALUE
----- ---------- ---------- ---------- ----------
MW006 1 1 1 32
MW006 1 1 2 34
MW006 1 1 3 36
MW006 1 1 4 38
MW006 11 1 1
MW006 11 1 2
MW006 11 1 3

7 rijen zijn geselecteerd.


Here 7 rows are selected instead of 4, because – as documented – all unique combinations that satisfy the non positional references, being col1, col2 and col3, are identified. Since I want only four I have to introduce an extra dimension value code2 based on code:

SQL> select *
2 from mytable
3 model
4 return updated rows
5 dimension by (code, code code2, col1, col2, col3)
6 measures (value)
7 rules upsert all
8 ( value ['MW006', code2 = 'MW001', col1, col2, col3]
9 = value['MW001', 'MW001', cv(), cv(), cv()]
10 + value['MW002', 'MW002', cv(), cv(), cv()]
11 )
12 order by code
13 , col1
14 , col2
15 , col3
16 /

CODE CODE2 COL1 COL2 COL3 VALUE
----- ----- ---------- ---------- ---------- ----------
MW006 MW001 1 1 1 32
MW006 MW001 1 1 2 34
MW006 MW001 1 1 3 36
MW006 MW001 1 1 4 38

4 rijen zijn geselecteerd.


For this solution it is also –just as the solution using a query per rule- absolutely necessary to know against which base set the rows should be generated. This is a tedious task in a situation with 100+ rules, but there is no way around that. At least a lot of inner queries are avoided using UPSERT ALL.

I ended up suggesting a third alternative. However, looking back on it, I am not at all happy with this advice because the third alternative doesn’t relieve the fact of having to know the base set. I guess I was too disappointed about not getting to the point of not having to know the base set, that I turned down the UPSERT ALL variant altogether.
The third alternative was this:

SQL> select *
2 from mytable
3 model
4 return updated rows
5 reference r on (select code, count(*) countcode from mytable group by code)
6 dimension by (code coderef)
7 measures (countcode)
8 main m
9 dimension by (code, row_number() over (partition by code order by col1, col2, col3) rn)
10 measures (col1, col2, col3, value)
11 rules
12 ( col1['MW006',for rn from 1 to countcode['MW001'] increment 1] = col1['MW001',cv()]
13 , col2['MW006',any] = col2['MW001',cv()]
14 , col3['MW006',any] = col3['MW001',cv()]
15 , value['MW006',any] = value['MW001',cv()] + value['MW002',cv()]
16 , col1['MW007',for rn from 1 to countcode['MW001'] increment 1] = col1['MW001',cv()]
17 , col2['MW007',any] = col2['MW001',cv()]
18 , col3['MW007',any] = col3['MW001',cv()]
19 , value['MW007',any] = value['MW006',cv()] - value['MW003',cv()]
20 , col1['MW008',for rn from 1 to countcode['MW004'] increment 1] = col1['MW001',cv()]
21 , col2['MW008',any] = col2['MW001',cv()]
22 , col3['MW008',any] = col3['MW001',cv()]
23 , value['MW008',any] = value['MW004',cv()] + value['MW005',cv()]
24 , col1['MW009',for rn from 1 to countcode['MW001'] increment 1] = col1['MW001',cv()]
25 , col2['MW009',any] = col2['MW001',cv()]
26 , col3['MW009',any] = col3['MW001',cv()]
27 , value['MW009',any] = value['MW007',cv()] + value['MW008',cv()]
28 , col1['MW014',for rn from 1 to countcode['MW011'] increment 1] = col1['MW011',cv()]
29 , col2['MW014',any] = col2['MW011',cv()]
30 , col3['MW014',any] = col3['MW011',cv()]
31 , value['MW014',any] = value['MW011',cv()] + value['MW012',cv()]
32 , col1['MW015',for rn from 1 to countcode['MW011'] increment 1] = col1['MW011',cv()]
33 , col2['MW015',any] = col2['MW011',cv()]
34 , col3['MW015',any] = col3['MW011',cv()]
35 , value['MW015',any] = value['MW013',cv()] + value['MW014',cv()]
36 )
37 order by code
38 , col1
39 , col2
40 , col3
41 /

CODE RN COL1 COL2 COL3 VALUE
----- ---------- ---------- ---------- ---------- ----------
MW006 1 1 1 1 32
MW006 2 1 1 2 34
MW006 3 1 1 3 36
MW006 4 1 1 4 38
MW007 1 1 1 1 1
MW007 2 1 1 2 2
MW007 3 1 1 3 3
MW007 4 1 1 4 4
MW008 1 1 1 1 92
MW008 2 1 1 2 94
MW008 3 1 1 3 96
MW008 4 1 1 4 98
MW009 1 1 1 1 93
MW009 2 1 1 2 96
MW009 3 1 1 3 99
MW009 4 1 1 4 102
MW014 1 11 1 1 32
MW014 2 11 1 2 34
MW014 3 11 1 3 36
MW015 1 11 1 1 63
MW015 2 11 1 2 66
MW015 3 11 1 3 69

22 rijen zijn geselecteerd.


The key point here is to use the FOR loop to generate the rows. To do so I need to index all entries by means of the ROW_NUMBER analytic function, and use a reference model to know how many rows to generate. It works, but it is more complex compared to the UPSERT ALL alternative and there are a lot more rules this way, because of the increase in measure values.

Luckily they turned down the third alternative, but they could have used the UPSERT ALL perfectly. The advice came too late however, so they headed towards a totally different approach. Very likely something which does not involve using materialized views with a model clause query.

The complete UPSERT ALL solution looks like this:

SQL> select *
2 from mytable
3 model
4 return updated rows
5 dimension by (code, code code2, col1, col2, col3)
6 measures (value)
7 rules upsert all
8 ( value ['MW006', code2 = 'MW001', col1, col2, col3]
9 = value['MW001', 'MW001', cv(), cv(), cv()]
10 + value['MW002', 'MW002', cv(), cv(), cv()]
11 , value ['MW007', code2 = 'MW001', col1, col2, col3]
12 = value['MW006', 'MW001', cv(), cv(), cv()]
13 - value['MW003', 'MW003', cv(), cv(), cv()]
14 , value ['MW008', code2 = 'MW001', col1, col2, col3]
15 = value['MW004', 'MW004', cv(), cv(), cv()]
16 + value['MW005', 'MW005', cv(), cv(), cv()]
17 , value ['MW009', code2 = 'MW001', col1, col2, col3]
18 = value['MW007', 'MW001', cv(), cv(), cv()]
19 + value['MW008', 'MW001', cv(), cv(), cv()]
20 , value ['MW014', code2 = 'MW011', col1, col2, col3]
21 = value['MW011', 'MW011', cv(), cv(), cv()]
22 + value['MW012', 'MW012', cv(), cv(), cv()]
23 , value ['MW015', code2 = 'MW011', col1, col2, col3]
24 = value['MW013', 'MW013', cv(), cv(), cv()]
25 + value['MW014', 'MW011', cv(), cv(), cv()]
26 )
27 order by code
28 , col1
29 , col2
30 , col3
31 /

CODE CODE2 COL1 COL2 COL3 VALUE
----- ----- ---------- ---------- ---------- ----------
MW006 MW001 1 1 1 32
MW006 MW001 1 1 2 34
MW006 MW001 1 1 3 36
MW006 MW001 1 1 4 38
MW007 MW001 1 1 1 1
MW007 MW001 1 1 2 2
MW007 MW001 1 1 3 3
MW007 MW001 1 1 4 4
MW008 MW001 1 1 1 92
MW008 MW001 1 1 2 94
MW008 MW001 1 1 3 96
MW008 MW001 1 1 4 98
MW009 MW001 1 1 1 93
MW009 MW001 1 1 2 96
MW009 MW001 1 1 3 99
MW009 MW001 1 1 4 102
MW014 MW011 11 1 1 32
MW014 MW011 11 1 2 34
MW014 MW011 11 1 3 36
MW015 MW011 11 1 1 63
MW015 MW011 11 1 2 66
MW015 MW011 11 1 3 69

22 rijen zijn geselecteerd.


Or maybe for sake of clarity you could separate the generating of new rows based on original base sets from the calculation of the new values like this:

SQL> select *
2 from ( select code, col1, col2, col3, value
3 from mytable
4 model
5 dimension by (code, code code2, col1, col2, col3)
6 measures (value)
7 rules upsert all
8 ( value['MW006', code2 = 'MW001', col1, col2, col3] = 0
9 , value['MW007', code2 = 'MW001', col1, col2, col3] = 0
10 , value['MW008', code2 = 'MW001', col1, col2, col3] = 0
11 , value['MW009', code2 = 'MW001', col1, col2, col3] = 0
12 , value['MW014', code2 = 'MW011', col1, col2, col3] = 0
13 , value['MW015', code2 = 'MW011', col1, col2, col3] = 0
14 )
15 )
16 model
17 return updated rows
18 dimension by (code, col1, col2, col3)
19 measures (value)
20 rules automatic order
21 ( value['MW006', col1, col2, col3]
22 = value['MW001', cv(), cv(), cv()]
23 + value['MW002', cv(), cv(), cv()]
24 , value['MW008', col1, col2, col3]
25 = value['MW004', cv(), cv(), cv()]
26 + value['MW005', cv(), cv(), cv()]
27 , value['MW015', col1, col2, col3]
28 = value['MW013', cv(), cv(), cv()]
29 + value['MW014', cv(), cv(), cv()]
30 , value['MW009', col1, col2, col3]
31 = value['MW007', cv(), cv(), cv()]
32 + value['MW008', cv(), cv(), cv()]
33 , value['MW014', col1, col2, col3]
34 = value['MW011', cv(), cv(), cv()]
35 + value['MW012', cv(), cv(), cv()]
36 , value['MW007', col1, col2, col3]
37 = value['MW006', cv(), cv(), cv()]
38 - value['MW003', cv(), cv(), cv()]
39 )
40 order by code
41 , col1
42 , col2
43 , col3
44 /

CODE COL1 COL2 COL3 VALUE
----- ---------- ---------- ---------- ----------
MW006 1 1 1 32
MW006 1 1 2 34
MW006 1 1 3 36
MW006 1 1 4 38
MW007 1 1 1 1
MW007 1 1 2 2
MW007 1 1 3 3
MW007 1 1 4 4
MW008 1 1 1 92
MW008 1 1 2 94
MW008 1 1 3 96
MW008 1 1 4 98
MW009 1 1 1 93
MW009 1 1 2 96
MW009 1 1 3 99
MW009 1 1 4 102
MW014 11 1 1 32
MW014 11 1 2 34
MW014 11 1 3 36
MW015 11 1 1 63
MW015 11 1 2 66
MW015 11 1 3 69

22 rijen zijn geselecteerd.


UPDATE


Today I found out he did use the UPSERT ALL and that my phrase "it is also <...> absolutely necessary to know against which base set the rows should be generated." is worded too strong.

He used this query:

SQL> select *
2 from ( select *
3 from mytable
4 model
5 return updated rows
6 dimension by (code, col1, col2, col3)
7 measures (value)
8 rules upsert all
9 ( value ['MW006', col1, col2, col3]
10 = case
11 when value['MW001', cv(), cv(), cv()] is not null
12 or value['MW002', cv(), cv(), cv()] is not null
13 then
14 nvl(value['MW001', cv(), cv(), cv()],0)
15 + nvl(value['MW002', cv(), cv(), cv()],0)
16 end
17 , value ['MW007', col1, col2, col3]
18 = case
19 when value['MW006', cv(), cv(), cv()] is not null
20 or value['MW003', cv(), cv(), cv()] is not null
21 then
22 nvl(value['MW006', cv(), cv(), cv()],0)
23 - nvl(value['MW003', cv(), cv(), cv()],0)
24 end
25 , value ['MW008', col1, col2, col3]
26 = case
27 when value['MW004', cv(), cv(), cv()] is not null
28 or value['MW005', cv(), cv(), cv()] is not null
29 then
30 nvl(value['MW004', cv(), cv(), cv()],0)
31 + nvl(value['MW005', cv(), cv(), cv()],0)
32 end
33 , value ['MW009', col1, col2, col3]
34 = case
35 when value['MW007', cv(), cv(), cv()] is not null
36 or value['MW008', cv(), cv(), cv()] is not null
37 then
38 nvl(value['MW007', cv(), cv(), cv()],0)
39 + nvl(value['MW008', cv(), cv(), cv()],0)
40 end
41 , value ['MW014', col1, col2, col3]
42 = case
43 when value['MW011', cv(), cv(), cv()] is not null
44 or value['MW012', cv(), cv(), cv()] is not null
45 then
46 nvl(value['MW011', cv(), cv(), cv()],0)
47 + nvl(value['MW012', cv(), cv(), cv()],0)
48 end
49 , value ['MW015', col1, col2, col3]
50 = case
51 when value['MW013', cv(), cv(), cv()] is not null
52 or value['MW014', cv(), cv(), cv()] is not null
53 then
54 nvl(value['MW013', cv(), cv(), cv()],0)
55 + nvl(value['MW014', cv(), cv(), cv()],0)
56 end
57 )
58 )
59 where value is not null
60 order by code
61 , col1
62 , col2
63 , col3
64 /

CODE COL1 COL2 COL3 VALUE
----- ---------- ---------- ---------- ----------
MW006 1 1 1 32
MW006 1 1 2 34
MW006 1 1 3 36
MW006 1 1 4 38
MW007 1 1 1 1
MW007 1 1 2 2
MW007 1 1 3 3
MW007 1 1 4 4
MW008 1 1 1 92
MW008 1 1 2 94
MW008 1 1 3 96
MW008 1 1 4 98
MW009 1 1 1 93
MW009 1 1 2 96
MW009 1 1 3 99
MW009 1 1 4 102
MW014 11 1 1 32
MW014 11 1 2 34
MW014 11 1 3 36
MW015 11 1 1 63
MW015 11 1 2 66
MW015 11 1 3 69

22 rijen zijn geselecteerd.


The big advantages are that it is now not necessary anymore to refer to an original set and compared to the solution using a query per rule, it is a lot faster.

However, it is slower than using the extra dimension and referring to the base set. A lot of unnecessary rows are generated that are filtered out at the end by the predicate "value is not null". It feels a lot like forgetting a join condition, resulting in having lots of duplicate rows, and filtering them out by using the DISTINCT keyword.

Very ugly, but it has its advantages.

UPDATE TWO


Volder has left a comment showing an alternative query. By putting the col1, col2 and col3 columns as a partition, the rules become very easy. And as can be seen in the motto of this blog, I really like simple solutions. His solution is this:

SQL> select *
2 from ( select code
3 , col1
4 , col2
5 , col3
6 , value
7 from mytable
8 model
9 return updated rows
10 partition by (col1, col2, col3)
11 dimension by (code)
12 measures(value)
13 rules
14 ( value['MW006'] = value['MW001'] + value['MW002']
15 , value['MW007'] = value['MW006'] - value['MW003']
16 , value['MW008'] = value['MW004'] + value['MW005']
17 , value['MW009'] = value['MW007'] + value['MW008']
18 , value['MW014'] = value['MW011'] + value['MW012']
19 , value['MW015'] = value['MW013'] + value['MW014']
20 )
21 )
22 where value is not null
23 order by code
24 , col1
25 , col2
26 , col3
27 /

CODE COL1 COL2 COL3 VALUE
----- ---------- ---------- ---------- ----------
MW006 1 1 1 32
MW006 1 1 2 34
MW006 1 1 3 36
MW006 1 1 4 38
MW007 1 1 1 1
MW007 1 1 2 2
MW007 1 1 3 3
MW007 1 1 4 4
MW008 1 1 1 92
MW008 1 1 2 94
MW008 1 1 3 96
MW008 1 1 4 98
MW009 1 1 1 93
MW009 1 1 2 96
MW009 1 1 3 99
MW009 1 1 4 102
MW014 11 1 1 32
MW014 11 1 2 34
MW014 11 1 3 36
MW015 11 1 1 63
MW015 11 1 2 66
MW015 11 1 3 69

22 rijen zijn geselecteerd.


The query creates seven partitions for the seven unique combinations of col1, col2 and col3. And for each of those partitions six new cells are calculated, resulting in 42 rows. By inlining the query and filtering the ones with value NULL, the query returns the 22 requested new rows.

So in essence it is the same the query that is currently being used, and which I described in the first update section. That query also generated too many rows - also 42 - that had to be filtered out in the outer where. However, you have to agree that Volder's query looks much much easier.

Personally I still like the query using UPSERT ALL and an extra dimension "code2" the most, because only the necessary rows are generated. The new one also feels like forgetting a join condition and using distinct. But Volder's query sure is superior on all aspects to the query currently being used.

4 comments:

  1. Hi, Rob.
    I didn't compare the perfomance, but it would look much simpler if you used:

    SQL> select * from
    2 (select code,col1,col2,col3,value from mytable
    3 model
    4 return updated rows
    5 partition by (col1, col2, col3)
    6 dimension by (code)
    7 measures(value)
    8 (value['MW006'] = value['MW001']+value['MW002'],
    9 value['MW007'] = value['MW006']-value['MW003'],
    10 value['MW008'] = value['MW004']+value['MW005'],
    11 value['MW009'] = value['MW007']+value['MW008'],
    12 value['MW014'] = value['MW011']+value['MW012'],
    13 value['MW015'] = value['MW013']+value['MW014']
    14 )
    15 )
    16 where value is not null
    17 order by 1,2,3,4
    18 /

    CODE COL1 COL2 COL3 VALUE
    ----- ---------- ---------- ---------- ----------
    MW006 1 1 1 32
    MW006 1 1 2 34
    MW006 1 1 3 36
    MW006 1 1 4 38
    MW007 1 1 1 1
    MW007 1 1 2 2
    MW007 1 1 3 3
    MW007 1 1 4 4
    MW008 1 1 1 92
    MW008 1 1 2 94
    MW008 1 1 3 96
    MW008 1 1 4 98
    MW009 1 1 1 93
    MW009 1 1 2 96
    MW009 1 1 3 99
    MW009 1 1 4 102
    MW014 11 1 1 32
    MW014 11 1 2 34
    MW014 11 1 3 36
    MW015 11 1 1 63
    MW015 11 1 2 66
    MW015 11 1 3 69

    22 rows selected

    SQL>

    ReplyDelete
  2. Hi Volder,

    Thanks very much for your comment.
    I've added a second update section to the blog entry because of it.

    Regards,
    Rob.

    ReplyDelete
  3. Hello Rob, Volder,

    A rather late response, still, as they say, better late than never. Thank you both for helping me with this issue. Rob, you described the solution I used some time ago in the ‘Update’ section. Since then, in the words of Bob Dylan, ‘Things have changed’. The current and presumably final version of the model clause query takes advantage of the partitioning mentioned by Volder. I refrained from using Volder’s solution initially because we have a number of rules that require aggregation. Such a rule sums the values from a group of dimensions (the grouping could change depending on the aggregation). To give an example (based on your code in the update section):

    9 ( value ['MW006', ‘predefined value col1’, ‘predefined value col2’, col3]
    10 = sum(value)['MW001', any, any, cv()]
    11 , value ['MW007', col1, col2, col3]
    12 = case
    13 when value['MW006', ‘predefined value col1’, ‘predefined value col2’, cv()] is not null
    14 or value['MW003', cv(), cv(), cv()] is not null
    15 then
    16 nvl(value['MW006', ‘predefined value col1’, ‘predefined value col1’, cv()],0)
    17 * nvl(value['MW003', cv(), cv(), cv()],0)
    18 end

    Gradually however performance issues forced me to introduce a greater degree of parallelisation. As you know, by moving columns from the ‘dimension by’ clause to the ‘partition by’ clause boundary points are introduced that help execute the query in parallel. A small test showed that partitioning by all columns except ‘code’ and ‘value’ offered tremendous performance gains (with much more elegant looking code).

    The drawback is that a rule with an aggregation, like the one above, no longer returns the correct outcome, for the rule cannot cross the boundary points defined by the ‘partition by’ clause. In other words, the ‘any’ statements used when summing refer to the rows within the set defined by the boundary points.

    To solve this, the code was split by removing the aggregations from the model clause query to another query. The latter query takes care of the aggregations in advance and stores them in a table which is then referenced by the model clause. The model clause was rewritten to allow for greater parallelisation and to take into account the references to the table containing the aggregations. If similar action was undertaken for the example the code above would end up looking like this:

    9 value ['MW007]
    10 = case
    11 when aggregated_value['MW006', cv(col3)] is not null
    12 or value['MW003', cv(), cv(), cv()] is not null
    13 then
    14 nvl(aggregated_value['MW006', cv(col3)],0)
    15 * nvl(value['MW003', cv(), cv(), cv()],0)
    16 end

    The ‘MW006’ rule is absent from the code since it would be calculated in advance. The ‘aggregated_value’ measure is a referenced value which has ‘code’ and ‘col3’ as ‘dimension by’ values (the predefined values used earlier were only used to fill up the dimensions not relevant to an aggregation’s grouping).

    The disadvantages of this approach is that unnecessary rows are generated and the code (the business logic) is split and held in different places. The advantages are better performance (at least in my case) and code which is more understandable and much shorter.

    Thanks again,

    Dirk

    ReplyDelete
  4. Dirk,

    Thanks for letting us know.

    I don't know exactly how the current situation looks like, but storing aggregated values in a separate table sounds suspicious. Did you take a look at reference models?

    Regards,
    Rob.

    ReplyDelete