Thursday, January 17, 2008

Flexible attributes and performance

The client I currently work for has quite a few special tables containing flexible attributes. These tables have one foreign key to another table, the base table. The attributes in the table containing the flexibal attributes conceptually belong to the base table, but their attribute values can change over time - the table contains a startdate and an enddate. And it is important that one can see the value of those attributes at any given time.

Today I was working on optimizing a large query that involved (among many others...) a base table of 4M rows, and a flexible attributes table of 40M rows containing approximately 20 flexible attributes. The query needed the current value of 7 of those flexible attributes. I did some testing about what would be the best approach to query all those values. And because I think it is a case that is certainly not rare, it is now here on my blog.

Here is a test case to simulate the situation:

rwijk@ORA11G> create table base (id, description)
2 as
3 select level
4 , 'description' || to_char(level)
5 from dual
6 connect by level <= 10000
7 /

Tabel is aangemaakt.

rwijk@ORA11G> create table flexattributes_of_base
2 (id,attribute,base_id,value,startdate,enddate)
3 as
4 select level
5 , 'attribute' || to_char(mod(level-1,20))
6 , ceil(level/40)
7 , to_char(trunc(dbms_random.value(1,11)))
8 , case trunc(mod(level-1,40)/20)
9 when 0 then date '2007-01-01'
10 else date '2008-01-01'
11 end
12 , case trunc(mod(level-1,40)/20)
13 when 0 then date '2008-01-01'
14 end
15 from dual
16 connect by level <= 400000
17 /

Tabel is aangemaakt.

rwijk@ORA11G> remark
rwijk@ORA11G> remark Insert attribute20 with only non-active values
rwijk@ORA11G> remark
rwijk@ORA11G> insert into flexattributes_of_base
2 select 400000 + level
3 , 'attribute20'
4 , level
5 , 'a'
6 , date '2007-01-01'
7 , date '2008-01-01'
8 from dual
9 connect by level <= 10000
10 /

10000 rijen zijn aangemaakt.

rwijk@ORA11G> remark
rwijk@ORA11G> remark Insert attribute21 with more than one active value per base_id,
rwijk@ORA11G> remark which is garbage of course,
rwijk@ORA11G> remark
rwijk@ORA11G> insert into flexattributes_of_base
2 select 410000 + level
3 , 'attribute21'
4 , mod(level-1,10000) + 1
5 , 'b'
6 , case
7 when level <= 10000 then date '2007-01-01'
8 else date '2008-01-01'
9 end
10 , null
11 from dual
12 connect by level <= 20000
13 /

20000 rijen zijn aangemaakt.

rwijk@ORA11G> remark
rwijk@ORA11G> remark Insert attribute22 only for base_id 1
rwijk@ORA11G> remark
rwijk@ORA11G> insert into flexattributes_of_base
2 values
3 ( 430001
4 , 'attribute22'
5 , 1
6 , 'c'
7 , date '2007-01-01'
8 , null
9 )
10 /

1 rij is aangemaakt.

rwijk@ORA11G> alter table base add constraint base_pk primary key (id)
2 /

Tabel is gewijzigd.

rwijk@ORA11G> alter table flexattributes_of_base
2 add constraint flex_pk primary key (id)
3 /

Tabel is gewijzigd.

rwijk@ORA11G> alter table flexattributes_of_base
2 add constraint flex_uk1 unique (attribute,base_id,startdate)
3 /

Tabel is gewijzigd.

rwijk@ORA11G> exec dbms_stats.gather_table_stats(user,'base',cascade=>true)

PL/SQL-procedure is geslaagd.

rwijk@ORA11G> begin
2 dbms_stats.gather_table_stats
3 ( user
4 , 'flexattributes_of_base'
5 , cascade=>true
6 , method_opt=>'FOR ALL INDEXED COLUMNS'
7 );
8 end;
9 /

PL/SQL-procedure is geslaagd.


So in this testcase, the base table contains 10,000 rows, and the flex_attributes_of_base table contains 430,001 rows. There are 23 flexible attributes ('attribute0' until 'attribute22'). Every record in the base table has two records for each of the first 20 attributes (0-19). One for 2007-01-01 until 2008-01-01 and one for 2008-01-01 and upwards. The next three attributes are slightly different - you may even call it buggy - just to see if the queries output the right data in case:

- an attribute does not have a current value because earlier records all have an enddate (attribute20)

- an attribute has more than one current value, in which case the query should display the one with the most recent startdate (attribute21)

- an attribute that only base_id 1 has a value for (attribute22)

Let's issue a very simple but typical query against the two tables to retrieve the base record with id 1234 and its current value for attribute 11:

rwijk@ORA11G> select b.id
2 , b.description
3 , f11.value
4 from base b
5 , flexattributes_of_base f11
6 where b.id = 1234
7 and b.id = f11.base_id (+)
8 and f11.attribute (+) = 'attribute11'
9 and f11.enddate is null
10 and not exists
11 ( select 'more recent startdate'
12 from flexattributes_of_base f11_2
13 where f11_2.attribute = f11.attribute
14 and f11_2.base_id = f11.base_id
15 and f11_2.startdate > f11.startdate
16 )
17 /

ID DESCRIPTION VALUE
-------- --------------------------------------------------- -----
1234 description1234 3

1 rij is geselecteerd.


Note that the exists-subquery is necessary in case of attribute21, when more than one row exists with an empty enddate.

And if you also want the current value of attribute22, you just repeat the same pattern:

rwijk@ORA11G> select b.id
2 , b.description
3 , f11.value
4 , f22.value
5 from base b
6 , flexattributes_of_base f11
7 , flexattributes_of_base f22
8 where b.id = 1234
9 and b.id = f11.base_id (+)
10 and b.id = f22.base_id (+)
11 and f11.attribute (+) = 'attribute11'
12 and f22.attribute (+) = 'attribute22'
13 and f11.enddate is null
14 and f22.enddate is null
15 and not exists
16 ( select 'more recent startdate'
17 from flexattributes_of_base f11_2
18 where f11_2.attribute = f11.attribute
19 and f11_2.base_id = f11.base_id
20 and f11_2.startdate > f11.startdate
21 )
22 and not exists
23 ( select 'more recent startdate'
24 from flexattributes_of_base f22_2
25 where f22_2.attribute = f22.attribute
26 and f22_2.base_id = f22.base_id
27 and f22_2.startdate > f22.startdate
28 )
29 /

ID DESCRIPTION VALUE VALUE
-------- --------------------------------------------------- ----- -----
1234 description1234 3

1 rij is geselecteerd.


As said these queries are quite standard, so when they had to query the entire base table with 7 attributes (i'll use 5 below), they did it like this initially:

rwijk@ORA11G> select b.id
2 , f19.value f19_value
3 , f20.value f20_value
4 , f21.value f21_value
5 , f22.value f22_value
6 , f23.value f23_value
7 from base b
8 , flexattributes_of_base f19
9 , flexattributes_of_base f20
10 , flexattributes_of_base f21
11 , flexattributes_of_base f22
12 , flexattributes_of_base f23
13 where b.id = f19.base_id (+)
14 and b.id = f20.base_id (+)
15 and b.id = f21.base_id (+)
16 and b.id = f22.base_id (+)
17 and b.id = f23.base_id (+)
18 and f19.attribute (+) = 'attribute19'
19 and f20.attribute (+) = 'attribute20'
20 and f21.attribute (+) = 'attribute21'
21 and f22.attribute (+) = 'attribute22'
22 and f23.attribute (+) = 'attribute23'
23 and f19.enddate (+) is null
24 and f20.enddate (+) is null
25 and f21.enddate (+) is null
26 and f22.enddate (+) is null
27 and f23.enddate (+) is null
28 and not exists
29 ( select 'more recent startdate'
30 from flexattributes_of_base f19_2
31 where f19_2.attribute = f19.attribute
32 and f19_2.base_id = f19.base_id
33 and f19_2.startdate > f19.startdate
34 )
35 and not exists
36 ( select 'more recent startdate'
37 from flexattributes_of_base f20_2
38 where f20_2.attribute = f20.attribute
39 and f20_2.base_id = f20.base_id
40 and f20_2.startdate > f20.startdate
41 )
42 and not exists
43 ( select 'more recent startdate'
44 from flexattributes_of_base f21_2
45 where f21_2.attribute = f21.attribute
46 and f21_2.base_id = f21.base_id
47 and f21_2.startdate > f21.startdate
48 )
49 and not exists
50 ( select 'more recent startdate'
51 from flexattributes_of_base f22_2
52 where f22_2.attribute = f22.attribute
53 and f22_2.base_id = f22.base_id
54 and f22_2.startdate > f22.startdate
55 )
56 and not exists
57 ( select 'more recent startdate'
58 from flexattributes_of_base f23_2
59 where f23_2.attribute = f23.attribute
60 and f23_2.base_id = f23.base_id
61 and f23_2.startdate > f23.startdate
62 )
63 order by b.id
64 /


But here the same table/index is accessed 10 times. One developer noticed that and wrote another query to address this problem. He was able to get rid of the exists-subquery using the row_number analytic function, like this:

rwijk@ORA11G> select b.id
2 , f19.value f19_value
3 , f20.value f20_value
4 , f21.value f21_value
5 , f22.value f22_value
6 , f23.value f23_value
7 from base b
8 , ( select base_id
9 , value
10 , row_number() over (partition by base_id order by startdate desc) rn
11 from flexattributes_of_base
12 where attribute = 'attribute19'
13 and enddate is null
14 ) f19
15 , ( select base_id
16 , value
17 , row_number() over (partition by base_id order by startdate desc) rn
18 from flexattributes_of_base
19 where attribute = 'attribute20'
20 and enddate is null
21 ) f20
22 , ( select base_id
23 , value
24 , row_number() over (partition by base_id order by startdate desc) rn
25 from flexattributes_of_base
26 where attribute = 'attribute21'
27 and enddate is null
28 ) f21
29 , ( select base_id
30 , value
31 , row_number() over (partition by base_id order by startdate desc) rn
32 from flexattributes_of_base
33 where attribute = 'attribute22'
34 and enddate is null
35 ) f22
36 , ( select base_id
37 , value
38 , enddate
39 , row_number() over (partition by base_id order by startdate desc) rn
40 from flexattributes_of_base
41 where attribute = 'attribute23'
42 ) f23
43 where b.id = f19.base_id (+)
44 and b.id = f20.base_id (+)
45 and b.id = f21.base_id (+)
46 and b.id = f22.base_id (+)
47 and b.id = f23.base_id (+)
48 and (f19.rn = 1 or f19.rn is null)
49 and (f20.rn = 1 or f20.rn is null)
50 and (f21.rn = 1 or f21.rn is null)
51 and (f22.rn = 1 or f22.rn is null)
52 and (f23.rn = 1 or f23.rn is null)
53 order by b.id
54 /


This query is a bit better, but still accesses the table 5 times. The number of table scans was reduced to only one, by using a technique that resembles the well known pivot trick:

rwijk@ORA11G> select b.id
2 , f.f19_value
3 , f.f20_value
4 , f.f21_value
5 , f.f22_value
6 , f.f23_value
7 from base b
8 , ( select base_id
9 , max(decode(attribute,'attribute19',value))
10 keep
11 (dense_rank last
12 order by decode(attribute,'attribute19',startdate) nulls first
13 ) f19_value
14 , max(decode(attribute,'attribute20',value))
15 keep
16 (dense_rank last
17 order by decode(attribute,'attribute20',startdate) nulls first
18 ) f20_value
19 , max(decode(attribute,'attribute21',value))
20 keep
21 (dense_rank last
22 order by decode(attribute,'attribute21',startdate) nulls first
23 ) f21_value
24 , max(decode(attribute,'attribute22',value))
25 keep
26 (dense_rank last
27 order by decode(attribute,'attribute22',startdate) nulls first
28 ) f22_value
29 , max(decode(attribute,'attribute23',value))
30 keep
31 (dense_rank last
32 order by decode(attribute,'attribute23',startdate) nulls first
33 ) f23_value
34 from flexattributes_of_base
35 where attribute in ('attribute19','attribute20','attribute21','attribute22','attribute23')
36 and enddate is null
37 group by base_id
38 ) f
39 where b.id = f.base_id (+)
40 order by b.id
41 /


The tkprof output of the three queries:

Query1:

call     count       cpu    elapsed       disk      query    current        rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 1.76 1.77 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 668 1.85 5.95 9622 39430 0 10000
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 670 3.62 7.72 9622 39430 0 10000

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 81

Rows Row Source Operation
------- ---------------------------------------------------
10000 SORT ORDER BY (cr=39430 pr=9622 pw=9622 time=218 us cost=9881 size=2274972 card=7738)
10000 HASH JOIN ANTI (cr=39430 pr=9622 pw=9622 time=217 us cost=9389 size=2274972 card=7738)
20000 HASH JOIN RIGHT OUTER (cr=37401 pr=9622 pw=9622 time=1172 us cost=7990 size=2199150 card=8145)
20000 TABLE ACCESS FULL FLEXATTRIBUTES_OF_BASE (cr=2392 pr=72 pw=72 time=361 us cost=669 size=35972 card=1058)
10000 HASH JOIN ANTI (cr=35009 pr=9550 pw=9550 time=198 us cost=7320 size=1922220 card=8145)
10000 HASH JOIN RIGHT OUTER (cr=32980 pr=9550 pw=9550 time=870 us cost=5939 size=1817688 card=8574)
10000 TABLE ACCESS FULL FLEXATTRIBUTES_OF_BASE (cr=2389 pr=2386 pw=2386 time=669 us cost=669 size=34374 card=1011)
10000 HASH JOIN ANTI (cr=30591 pr=7164 pw=7164 time=189 us cost=5270 size=1526172 card=8574)
10000 HASH JOIN RIGHT OUTER (cr=28562 pr=7164 pw=7164 time=830 us cost=3909 size=1389850 card=9025)
0 TABLE ACCESS FULL FLEXATTRIBUTES_OF_BASE (cr=2389 pr=2386 pw=2386 time=0 us cost=669 size=18054 card=531)
10000 HASH JOIN ANTI (cr=26173 pr=4778 pw=4778 time=180 us cost=3239 size=1083000 card=9025)
10000 HASH JOIN RIGHT OUTER (cr=24144 pr=4777 pw=4777 time=830 us cost=1900 size=912000 card=9500)
0 TABLE ACCESS FULL FLEXATTRIBUTES_OF_BASE (cr=2389 pr=2386 pw=2386 time=0 us cost=669 size=9010 card=265)
10000 HASH JOIN ANTI (cr=21755 pr=2391 pw=2391 time=175 us cost=1231 size=589000 card=9500)
10000 HASH JOIN RIGHT OUTER (cr=2414 pr=2386 pw=2386 time=979 us cost=676 size=380000 card=10000)
1 TABLE ACCESS FULL FLEXATTRIBUTES_OF_BASE (cr=2389 pr=2386 pw=2386 time=0 us cost=669 size=9010 card=265)
10000 INDEX FAST FULL SCAN BASE_PK (cr=25 pr=0 pw=0 time=199 us cost=7 size=40000 card=10000)(object id 72460)
430001 INDEX FAST FULL SCAN FLEX_UK1 (cr=19341 pr=5 pw=5 time=7911 us cost=551 size=10320024 card=430001)(object id 72462)
430001 INDEX FAST FULL SCAN FLEX_UK1 (cr=2029 pr=1 pw=1 time=7335 us cost=551 size=10320024 card=430001)(object id 72462)
430001 INDEX FAST FULL SCAN FLEX_UK1 (cr=2029 pr=0 pw=0 time=7321 us cost=551 size=10320024 card=430001)(object id 72462)
430001 INDEX FAST FULL SCAN FLEX_UK1 (cr=2029 pr=0 pw=0 time=7338 us cost=551 size=10320024 card=430001)(object id 72462)
430001 INDEX FAST FULL SCAN FLEX_UK1 (cr=2029 pr=0 pw=0 time=7420 us cost=551 size=10320024 card=430001)(object id 72462)


Query2:

call     count       cpu    elapsed       disk      query    current        rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.32 0.34 0 0 0 0
Execute 1 0.01 0.00 0 0 0 0
Fetch 668 2.15 2.21 2 105894 0 10000
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 670 2.50 2.55 2 105894 0 10000

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 81

Rows Row Source Operation
------- ---------------------------------------------------
10000 SORT ORDER BY (cr=105894 pr=2 pw=2 time=200 us cost=1588 size=211 card=1)
10000 FILTER (cr=105894 pr=2 pw=2 time=19897 us)
20000 NESTED LOOPS OUTER (cr=105894 pr=2 pw=2 time=34524 us cost=1587 size=211 card=1)
10000 FILTER (cr=65792 pr=2 pw=2 time=12472 us)
10000 NESTED LOOPS OUTER (cr=65792 pr=2 pw=2 time=11293 us cost=1582 size=172 card=1)
10000 FILTER (cr=34710 pr=2 pw=2 time=5387 us)
10000 NESTED LOOPS OUTER (cr=34710 pr=2 pw=2 time=4959 us cost=1572 size=266 card=2)
10000 FILTER (cr=4809 pr=2 pw=2 time=570 us)
10000 HASH JOIN OUTER (cr=4809 pr=2 pw=2 time=210 us cost=1397 size=3290 card=35)
10000 FILTER (cr=2417 pr=2 pw=2 time=1142 us)
10000 HASH JOIN RIGHT OUTER (cr=2417 pr=2 pw=2 time=821 us cost=677 size=32725 card=595)
1 VIEW (cr=2392 pr=2 pw=2 time=0 us cost=670 size=10335 card=265)
1 WINDOW SORT (cr=2392 pr=2 pw=2 time=0 us cost=670 size=9010 card=265)
1 TABLE ACCESS FULL FLEXATTRIBUTES_OF_BASE (cr=2392 pr=2 pw=2 time=0 us cost=669 size=9010 card=265)
10000 INDEX FAST FULL SCAN BASE_PK (cr=25 pr=0 pw=0 time=179 us cost=7 size=160000 card=10000)(object id 72460)
0 VIEW (cr=2392 pr=0 pw=0 time=0 us cost=719 size=207012 card=5308)
0 WINDOW SORT (cr=2392 pr=0 pw=0 time=0 us cost=719 size=180472 card=5308)
0 TABLE ACCESS FULL FLEXATTRIBUTES_OF_BASE (cr=2392 pr=0 pw=0 time=0 us cost=667 size=180472 card=5308)
0 VIEW PUSHED PREDICATE (cr=29901 pr=0 pw=0 time=0 us cost=5 size=39 card=1)
0 WINDOW SORT (cr=29901 pr=0 pw=0 time=0 us cost=5 size=34 card=1)
0 TABLE ACCESS BY INDEX ROWID FLEXATTRIBUTES_OF_BASE (cr=29901 pr=0 pw=0 time=0 us cost=4 size=34 card=1)
10000 INDEX RANGE SCAN FLEX_UK1 (cr=20053 pr=0 pw=0 time=0 us cost=3 size=0 card=1)(object id 72462)
10000 VIEW PUSHED PREDICATE (cr=31082 pr=0 pw=0 time=0 us cost=5 size=39 card=1)
10000 WINDOW SORT (cr=31082 pr=0 pw=0 time=0 us cost=5 size=34 card=1)
10000 TABLE ACCESS BY INDEX ROWID FLEXATTRIBUTES_OF_BASE (cr=31082 pr=0 pw=0 time=0 us cost=4 size=34 card=1)
20000 INDEX RANGE SCAN FLEX_UK1 (cr=20107 pr=0 pw=0 time=25499 us cost=3 size=0 card=2)(object id 72462)
20000 VIEW PUSHED PREDICATE (cr=40102 pr=0 pw=0 time=75748 us cost=5 size=39 card=1)
20000 WINDOW SORT (cr=40102 pr=0 pw=0 time=34334 us cost=5 size=34 card=1)
20000 TABLE ACCESS BY INDEX ROWID FLEXATTRIBUTES_OF_BASE (cr=40102 pr=0 pw=0 time=93885 us cost=4 size=34 card=1)
20000 INDEX RANGE SCAN FLEX_UK1 (cr=20103 pr=0 pw=0 time=26785 us cost=3 size=0 card=2)(object id 72462)


Query 3:

call     count       cpu    elapsed       disk      query    current        rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.01 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 668 0.42 0.41 41 3076 0 10000
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 670 0.43 0.41 41 3076 0 10000

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 81

Rows Row Source Operation
------- ---------------------------------------------------
10000 MERGE JOIN OUTER (cr=3076 pr=41 pw=41 time=1922 us cost=763 size=1180000 card=10000)
10000 INDEX FULL SCAN BASE_PK (cr=684 pr=0 pw=0 time=222 us cost=21 size=40000 card=10000)(object id 72460)
10000 SORT JOIN (cr=2392 pr=41 pw=41 time=0 us cost=742 size=307344 card=2696)
10000 VIEW (cr=2392 pr=41 pw=41 time=553 us cost=670 size=307344 card=2696)
10000 SORT GROUP BY (cr=2392 pr=41 pw=41 time=225 us cost=670 size=91664 card=2696)
30001 TABLE ACCESS FULL FLEXATTRIBUTES_OF_BASE (cr=2392 pr=41 pw=41 time=1291 us cost=669 size=106420 card=3130)


The query runtime was drastically reduced by this trick. Not exactly rocket science, but very effective.

No comments:

Post a Comment