Friday, May 30, 2008

String aggregation with the model clause

There are lots of ways to do string aggregation. My favourite one is by using the model clause. Although you have to be aware of the increased PGA memory consumption. This can be a problem when you plan to use the query in multi user applications. A rare case if you ask me. It can also be a problem when a single session doesn't have enough PGA memory available. In the latter case it will read and write to temp, slowing the query down considerably. Apart from that, I can only see advantages: it is the fastest solution, quite readable if you are a bit familiar with the model clause and it doesn't need auxiliary objects like SQL types or functions.

In 11g:

rwijk@ORA11G> select deptno
2 , rtrim(ename,',') enames
3 from ( select deptno
4 , ename
5 , rn
6 from emp
7 model
8 partition by (deptno)
9 dimension by (row_number() over
10 (partition by deptno order by ename) rn
11 )
12 measures (cast(ename as varchar2(40)) ename)
13 rules
14 ( ename[any] order by rn desc = ename[cv()]||','||ename[cv()+1]
15 )
16 )
17 where rn = 1
18 order by deptno
19 /

DEPTNO ENAMES
---------- ----------------------------------------
10 CLARK,KING,MILLER
20 ADAMS,FORD,JONES,SCOTT,SMITH
30 ALLEN,BLAKE,JAMES,MARTIN,TURNER,WARD

3 rijen zijn geselecteerd.


First it indexes each ename in alphabetic order with the row_number analytic function. In department 10, ename[1] = 'CLARK', ename[2] = 'KING' and ename[3] = 'MILLER'. For each ename in reverse index order the enames are concatenated. ename[3] will be 'MILLER,', ename[2] will become 'KING,MILLER,' and ename[1] will become 'CLARK,KING,MILLER,'. In each partition, the ename cells with index value 1 are the ones we want containing the concatenated result. As a last step, the last comma has to be cut off. Also note that the ename column is expanded to a VARCHAR2(40) instead of its VARCHAR2(10) to be able to contain the concatenated names.

This query works in 10.2.0.1 and 10.2.0.3 as well. On 10.2.0.4 however:

SQL> select deptno
2 , rtrim(ename,',') enames
3 from ( select deptno
4 , ename
5 , rn
6 from emp
7 model
8 partition by (deptno)
9 dimension by (row_number() over
10 (partition by deptno order by ename) rn
11 )
12 measures (cast(ename as varchar2(40)) ename)
13 rules
14 ( ename[any] order by rn desc = ename[cv()]||','||ename[cv()+1]
15 )
16 )
17 where rn = 1
18 order by deptno
19 /
measures (cast(ename as varchar2(40)) ename)
*
ERROR at line 12:
ORA-25137: Data value out of range


A bug.

When casting the ename, datatype VARCHAR2(10), to a VARCHAR2(40), or any other length, this ORA-25137 appears. A workaround is to split the original ename measure from the string that is to contain the concatenated result. By this the memory consumption is increased a bit more, so I'm not fond of the workaround at all, but it works:

SQL> select deptno
2 , rtrim(enames,',') enames
3 from ( select deptno
4 , enames
5 , rn
6 from emp
7 model
8 partition by (deptno)
9 dimension by (row_number() over
10 (partition by deptno order by ename) rn
11 )
12 measures (ename, cast(null as varchar2(40)) enames)
13 rules
14 ( enames[any] order by rn desc = ename[cv()]||','||enames[cv()+1]
15 )
16 )
17 where rn = 1
18 order by deptno
19 /

DEPTNO ENAMES
---------- ---------------------------------------
10 CLARK,KING,MILLER
20 ADAMS,FORD,JONES,SCOTT,SMITH
30 ALLEN,BLAKE,JAMES,MARTIN,TURNER,WARD

3 rows selected.

6 comments:

  1. Good stuff, Rob - a model example of a nice, understandable walkthrough of a clause that can be quite hard to follow initially.

    ReplyDelete
  2. congrats,
    really excellent post!
    Thank you!

    ReplyDelete
  3. In the comment section of this blogpost http://rwijk.blogspot.com/2009/01/sql-model-clause-tutorial-part-three.html, you'll see a great alternative query by leeso for when you are on 10.2.0.4 or 10.2.0.5.

    ReplyDelete
  4. Rob, Brilliant stuff. Works perfectly for me. Is this query runs in SQL server?

    ReplyDelete