For example, let's start with this query:
rwijk@ORA11GR1> select deptno
2 , job
3 , empno
4 , ename
5 , sum(sal) sumsal
6 , case grouping_id(deptno,job,empno)
7 when 0 then 'grouped by deptno,job,empno,ename'
8 when 1 then 'grouped by deptno,job'
9 when 3 then 'grouped by deptno'
10 when 7 then 'grouped by ()'
11 end gr_text
12 from emp
13 group by rollup(deptno,job,(empno,ename))
14 order by deptno
15 , job
16 , empno
17 /
DEPTNO JOB EMPNO ENAME SUMSAL GR_TEXT
------ --------- ------ ---------- ------ ---------------------------------
10 CLERK 7934 MILLER 1300 grouped by deptno,job,empno,ename
10 CLERK 1300 grouped by deptno,job
10 MANAGER 7782 CLARK 2450 grouped by deptno,job,empno,ename
10 MANAGER 2450 grouped by deptno,job
10 PRESIDENT 7839 KING 5000 grouped by deptno,job,empno,ename
10 PRESIDENT 5000 grouped by deptno,job
10 8750 grouped by deptno
20 ANALYST 7788 SCOTT 3000 grouped by deptno,job,empno,ename
20 ANALYST 7902 FORD 3000 grouped by deptno,job,empno,ename
20 ANALYST 6000 grouped by deptno,job
20 CLERK 7369 SMITH 800 grouped by deptno,job,empno,ename
20 CLERK 7876 ADAMS 1100 grouped by deptno,job,empno,ename
20 CLERK 1900 grouped by deptno,job
20 MANAGER 7566 JONES 2975 grouped by deptno,job,empno,ename
20 MANAGER 2975 grouped by deptno,job
20 10875 grouped by deptno
30 CLERK 7900 JAMES 950 grouped by deptno,job,empno,ename
30 CLERK 950 grouped by deptno,job
30 MANAGER 7698 BLAKE 2850 grouped by deptno,job,empno,ename
30 MANAGER 2850 grouped by deptno,job
30 SALESMAN 7499 ALLEN 1600 grouped by deptno,job,empno,ename
30 SALESMAN 7521 WARD 1250 grouped by deptno,job,empno,ename
30 SALESMAN 7654 MARTIN 1250 grouped by deptno,job,empno,ename
30 SALESMAN 7844 TURNER 1500 grouped by deptno,job,empno,ename
30 SALESMAN 5600 grouped by deptno,job
30 9400 grouped by deptno
29025 grouped by ()
27 rijen zijn geselecteerd.
A rollup with N arguments always leads to N+1 grouping sets. So here the rollup has 3 arguments and you can see 4 distinct values in the last gr_text column.
Now let's say we want the grouping sets () and deptno duplicated, leading to 4 extra rows. With rollup you can specify deptno twice:
rwijk@ORA11GR1> select deptno
2 , job
3 , empno
4 , ename
5 , sum(sal) sumsal
6 , case grouping_id(deptno,job,empno)
7 when 0 then 'grouped by deptno,job,empno,ename'
8 when 1 then 'grouped by deptno,job'
9 when 3 then 'grouped by deptno'
10 when 7 then 'grouped by ()'
11 end gr_text
12 from emp
13 group by rollup(deptno,deptno,job,(empno,ename))
14 order by deptno
15 , job
16 , empno
17 /
DEPTNO JOB EMPNO ENAME SUMSAL GR_TEXT
------ --------- ------ ---------- ------ ---------------------------------
10 CLERK 7934 MILLER 1300 grouped by deptno,job,empno,ename
10 CLERK 1300 grouped by deptno,job
10 MANAGER 7782 CLARK 2450 grouped by deptno,job,empno,ename
10 MANAGER 2450 grouped by deptno,job
10 PRESIDENT 7839 KING 5000 grouped by deptno,job,empno,ename
10 PRESIDENT 5000 grouped by deptno,job
10 8750 grouped by deptno
10 8750 grouped by deptno
20 ANALYST 7788 SCOTT 3000 grouped by deptno,job,empno,ename
20 ANALYST 7902 FORD 3000 grouped by deptno,job,empno,ename
20 ANALYST 6000 grouped by deptno,job
20 CLERK 7369 SMITH 800 grouped by deptno,job,empno,ename
20 CLERK 7876 ADAMS 1100 grouped by deptno,job,empno,ename
20 CLERK 1900 grouped by deptno,job
20 MANAGER 7566 JONES 2975 grouped by deptno,job,empno,ename
20 MANAGER 2975 grouped by deptno,job
20 10875 grouped by deptno
20 10875 grouped by deptno
30 CLERK 7900 JAMES 950 grouped by deptno,job,empno,ename
30 CLERK 950 grouped by deptno,job
30 MANAGER 7698 BLAKE 2850 grouped by deptno,job,empno,ename
30 MANAGER 2850 grouped by deptno,job
30 SALESMAN 7499 ALLEN 1600 grouped by deptno,job,empno,ename
30 SALESMAN 7521 WARD 1250 grouped by deptno,job,empno,ename
30 SALESMAN 7654 MARTIN 1250 grouped by deptno,job,empno,ename
30 SALESMAN 7844 TURNER 1500 grouped by deptno,job,empno,ename
30 SALESMAN 5600 grouped by deptno,job
30 9400 grouped by deptno
30 9400 grouped by deptno
29025 grouped by ()
30 rijen zijn geselecteerd.
But there is no way to specify the empty grouping set, as it is implicit with the rollup operator. And so it's also impossible to duplicate this empty grouping set with rollup. It is possible though with the grouping sets notation. First, let's rewrite the original rollup expression to the more tedious but clearer grouping sets notation, like this:
rwijk@ORA11GR1> select deptno
2 , job
3 , empno
4 , ename
5 , sum(sal) sumsal
6 , case grouping_id(deptno,job,empno)
7 when 0 then 'grouped by deptno,job,empno,ename'
8 when 1 then 'grouped by deptno,job'
9 when 3 then 'grouped by deptno'
10 when 7 then 'grouped by ()'
11 end gr_text
12 from emp
13 group by grouping sets
14 ( (deptno,job,empno,ename)
15 , (deptno,job)
16 , deptno
17 , ()
18 )
19 order by deptno
20 , job
21 , empno
22 /
DEPTNO JOB EMPNO ENAME SUMSAL GR_TEXT
------ --------- ------ ---------- ------ ---------------------------------
10 CLERK 7934 MILLER 1300 grouped by deptno,job,empno,ename
10 CLERK 1300 grouped by deptno,job
10 MANAGER 7782 CLARK 2450 grouped by deptno,job,empno,ename
10 MANAGER 2450 grouped by deptno,job
10 PRESIDENT 7839 KING 5000 grouped by deptno,job,empno,ename
10 PRESIDENT 5000 grouped by deptno,job
10 8750 grouped by deptno
20 ANALYST 7788 SCOTT 3000 grouped by deptno,job,empno,ename
20 ANALYST 7902 FORD 3000 grouped by deptno,job,empno,ename
20 ANALYST 6000 grouped by deptno,job
20 CLERK 7369 SMITH 800 grouped by deptno,job,empno,ename
20 CLERK 7876 ADAMS 1100 grouped by deptno,job,empno,ename
20 CLERK 1900 grouped by deptno,job
20 MANAGER 7566 JONES 2975 grouped by deptno,job,empno,ename
20 MANAGER 2975 grouped by deptno,job
20 10875 grouped by deptno
30 CLERK 7900 JAMES 950 grouped by deptno,job,empno,ename
30 CLERK 950 grouped by deptno,job
30 MANAGER 7698 BLAKE 2850 grouped by deptno,job,empno,ename
30 MANAGER 2850 grouped by deptno,job
30 SALESMAN 7499 ALLEN 1600 grouped by deptno,job,empno,ename
30 SALESMAN 7521 WARD 1250 grouped by deptno,job,empno,ename
30 SALESMAN 7654 MARTIN 1250 grouped by deptno,job,empno,ename
30 SALESMAN 7844 TURNER 1500 grouped by deptno,job,empno,ename
30 SALESMAN 5600 grouped by deptno,job
30 9400 grouped by deptno
29025 grouped by ()
27 rijen zijn geselecteerd.
then it becomes very easy to duplicate the deptno and the empty grouping set. Just duplicate them in your grouping sets list of arguments:
rwijk@ORA11GR1> select deptno
2 , job
3 , empno
4 , ename
5 , sum(sal) sumsal
6 , case grouping_id(deptno,job,empno)
7 when 0 then 'grouped by deptno,job,empno,ename'
8 when 1 then 'grouped by deptno,job'
9 when 3 then 'grouped by deptno'
10 when 7 then 'grouped by ()'
11 end gr_text
12 from emp
13 group by grouping sets
14 ( (deptno,job,empno,ename)
15 , (deptno,job)
16 , deptno
17 , deptno
18 , ()
19 , ()
20 )
21 order by deptno
22 , job
23 , empno
24 /
DEPTNO JOB EMPNO ENAME SUMSAL GR_TEXT
------ --------- ------ ---------- ------ ---------------------------------
10 CLERK 7934 MILLER 1300 grouped by deptno,job,empno,ename
10 CLERK 1300 grouped by deptno,job
10 MANAGER 7782 CLARK 2450 grouped by deptno,job,empno,ename
10 MANAGER 2450 grouped by deptno,job
10 PRESIDENT 7839 KING 5000 grouped by deptno,job,empno,ename
10 PRESIDENT 5000 grouped by deptno,job
10 8750 grouped by deptno
10 8750 grouped by deptno
20 ANALYST 7788 SCOTT 3000 grouped by deptno,job,empno,ename
20 ANALYST 7902 FORD 3000 grouped by deptno,job,empno,ename
20 ANALYST 6000 grouped by deptno,job
20 CLERK 7369 SMITH 800 grouped by deptno,job,empno,ename
20 CLERK 7876 ADAMS 1100 grouped by deptno,job,empno,ename
20 CLERK 1900 grouped by deptno,job
20 MANAGER 7566 JONES 2975 grouped by deptno,job,empno,ename
20 MANAGER 2975 grouped by deptno,job
20 10875 grouped by deptno
20 10875 grouped by deptno
30 CLERK 7900 JAMES 950 grouped by deptno,job,empno,ename
30 CLERK 950 grouped by deptno,job
30 MANAGER 7698 BLAKE 2850 grouped by deptno,job,empno,ename
30 MANAGER 2850 grouped by deptno,job
30 SALESMAN 7499 ALLEN 1600 grouped by deptno,job,empno,ename
30 SALESMAN 7521 WARD 1250 grouped by deptno,job,empno,ename
30 SALESMAN 7654 MARTIN 1250 grouped by deptno,job,empno,ename
30 SALESMAN 7844 TURNER 1500 grouped by deptno,job,empno,ename
30 SALESMAN 5600 grouped by deptno,job
30 9400 grouped by deptno
30 9400 grouped by deptno
29025 grouped by ()
29025 grouped by ()
31 rijen zijn geselecteerd.
Now my question was: how can I distinguish between the two equal grouping sets in my select list? Not with GROUPING_ID, because I'd have to mention the columns on which were grouped, not the grouping set. A grouping_id(deptno) would yield to 1 in both deptno grouping sets.
Using the row_number analytic function and including all columns present in a grouping set in the partition by clause, was what I thought of next. This would accomplish a "1" for all singular grouping sets, and an arbitrary 1 and 2 within the duplicate grouping sets, like this:
rwijk@ORA11GR1> select deptno
2 , job
3 , empno
4 , ename
5 , sum(sal) sumsal
6 , case grouping_id(deptno,job,empno)
7 when 0 then 'grouped by deptno,job,empno,ename'
8 when 1 then 'grouped by deptno,job'
9 when 3 then 'grouped by deptno, grouping set ' ||
10 to_char(2+row_number() over
(partition by deptno,job,empno order by null))
11 when 7 then 'grouped by (), grouping set ' ||
12 to_char(4+row_number() over
(partition by deptno,job,empno order by null))
13 end gr_text
14 from emp
15 group by grouping sets
16 ( (deptno,job,empno,ename)
17 , (deptno,job)
18 , deptno
19 , deptno
20 , ()
21 , ()
22 )
23 order by deptno
24 , job
25 , empno
26 /
DEPTNO JOB EMPNO ENAME SUMSAL GR_TEXT
------ --------- ------ ---------- ------ ---------------------------------
10 CLERK 7934 MILLER 1300 grouped by deptno,job,empno,ename
10 CLERK 1300 grouped by deptno,job
10 MANAGER 7782 CLARK 2450 grouped by deptno,job,empno,ename
10 MANAGER 2450 grouped by deptno,job
10 PRESIDENT 7839 KING 5000 grouped by deptno,job,empno,ename
10 PRESIDENT 5000 grouped by deptno,job
10 8750 grouped by deptno, grouping set 3
10 8750 grouped by deptno, grouping set 4
20 ANALYST 7788 SCOTT 3000 grouped by deptno,job,empno,ename
20 ANALYST 7902 FORD 3000 grouped by deptno,job,empno,ename
20 ANALYST 6000 grouped by deptno,job
20 CLERK 7369 SMITH 800 grouped by deptno,job,empno,ename
20 CLERK 7876 ADAMS 1100 grouped by deptno,job,empno,ename
20 CLERK 1900 grouped by deptno,job
20 MANAGER 7566 JONES 2975 grouped by deptno,job,empno,ename
20 MANAGER 2975 grouped by deptno,job
20 10875 grouped by deptno, grouping set 3
20 10875 grouped by deptno, grouping set 4
30 CLERK 7900 JAMES 950 grouped by deptno,job,empno,ename
30 CLERK 950 grouped by deptno,job
30 MANAGER 7698 BLAKE 2850 grouped by deptno,job,empno,ename
30 MANAGER 2850 grouped by deptno,job
30 SALESMAN 7499 ALLEN 1600 grouped by deptno,job,empno,ename
30 SALESMAN 7521 WARD 1250 grouped by deptno,job,empno,ename
30 SALESMAN 7654 MARTIN 1250 grouped by deptno,job,empno,ename
30 SALESMAN 7844 TURNER 1500 grouped by deptno,job,empno,ename
30 SALESMAN 5600 grouped by deptno,job
30 9400 grouped by deptno, grouping set 3
30 9400 grouped by deptno, grouping set 4
29025 grouped by (), grouping set 5
29025 grouped by (), grouping set 6
31 rijen zijn geselecteerd.
The problem is the ordering. There is nothing to order by, so I chose NULL. Any other constant value would have sufficed as well. But this ordering is arbitrary, meaning that two rows can both be 1 and can both be 2. And some implementation detail of Oracle has to decide which one it's going to be. And if I use the row_number function twice, would both expressions give the same results? It depends on the same implementation, so yes, both expressions give the same results:
rwijk@ORA11GR1> select case grouping_id(deptno,job,empno)
2 when 0 then 'grouped by deptno,job,empno,ename'
3 when 1 then 'grouped by deptno,job'
4 when 3 then 'grouped by deptno, grouping set ' ||
5 to_char(2+row_number() over
(partition by deptno,job,empno order by null))
6 when 7 then 'grouped by (), grouping set ' ||
7 to_char(4+row_number() over
(partition by deptno,job,empno order by null))
8 end gr_text
9 , case grouping_id(deptno,job,empno)
10 when 0 then 'grouped by deptno,job,empno,ename'
11 when 1 then 'grouped by deptno,job'
12 when 3 then 'grouped by deptno, grouping set ' ||
13 to_char(2+row_number() over
(partition by deptno,job,empno order by null))
14 when 7 then 'grouped by (), grouping set ' ||
15 to_char(4+row_number() over
(partition by deptno,job,empno order by null))
16 end gr_text2
17 from emp
18 group by grouping sets
19 ( (deptno,job,empno,ename)
20 , (deptno,job)
21 , deptno
22 , deptno
23 , ()
24 , ()
25 )
26 order by deptno
27 , job
28 , empno
29 /
GR_TEXT GR_TEXT2
--------------------------------- ---------------------------------
grouped by deptno,job,empno,ename grouped by deptno,job,empno,ename
grouped by deptno,job grouped by deptno,job
grouped by deptno,job,empno,ename grouped by deptno,job,empno,ename
grouped by deptno,job grouped by deptno,job
grouped by deptno,job,empno,ename grouped by deptno,job,empno,ename
grouped by deptno,job grouped by deptno,job
grouped by deptno, grouping set 3 grouped by deptno, grouping set 3
grouped by deptno, grouping set 4 grouped by deptno, grouping set 4
grouped by deptno,job,empno,ename grouped by deptno,job,empno,ename
grouped by deptno,job,empno,ename grouped by deptno,job,empno,ename
grouped by deptno,job grouped by deptno,job
grouped by deptno,job,empno,ename grouped by deptno,job,empno,ename
grouped by deptno,job,empno,ename grouped by deptno,job,empno,ename
grouped by deptno,job grouped by deptno,job
grouped by deptno,job,empno,ename grouped by deptno,job,empno,ename
grouped by deptno,job grouped by deptno,job
grouped by deptno, grouping set 3 grouped by deptno, grouping set 3
grouped by deptno, grouping set 4 grouped by deptno, grouping set 4
grouped by deptno,job,empno,ename grouped by deptno,job,empno,ename
grouped by deptno,job grouped by deptno,job
grouped by deptno,job,empno,ename grouped by deptno,job,empno,ename
grouped by deptno,job grouped by deptno,job
grouped by deptno,job,empno,ename grouped by deptno,job,empno,ename
grouped by deptno,job,empno,ename grouped by deptno,job,empno,ename
grouped by deptno,job,empno,ename grouped by deptno,job,empno,ename
grouped by deptno,job,empno,ename grouped by deptno,job,empno,ename
grouped by deptno,job grouped by deptno,job
grouped by deptno, grouping set 3 grouped by deptno, grouping set 3
grouped by deptno, grouping set 4 grouped by deptno, grouping set 4
grouped by (), grouping set 5 grouped by (), grouping set 5
grouped by (), grouping set 6 grouped by (), grouping set 6
31 rijen zijn geselecteerd.
But still I felt uncomfortable with this solution. I was pressing the Previous and Next buttons in the SQL Reference Manual to find any related functions and saw the GROUP_ID function. "This has got to be the most useless function ever" must have been my thoughts when I read about this function years ago. In fact, I had completely forgotten about it. But it's entire reason for being is a case like this with duplicate grouping sets, to be able to distinguish between the sets:
rwijk@ORA11GR1> select deptno
2 , job
3 , empno
4 , ename
5 , sum(sal) sumsal
6 , case grouping_id(deptno,job,empno)
7 when 0 then 'grouped by deptno,job,empno,ename'
8 when 1 then 'grouped by deptno,job'
9 when 3 then 'grouped by deptno'
10 when 7 then 'grouped by ()'
11 end gr_text
12 , group_id() grid
13 from emp
14 group by grouping sets
15 ( (deptno,job,empno,ename)
16 , (deptno,job)
17 , deptno
18 , deptno
19 , ()
20 , ()
21 , ()
22 , ()
23 , ()
24 )
25 order by deptno
26 , job
27 , empno
28 /
DEPTNO JOB EMPNO ENAME SUMSAL GR_TEXT GRID
------ --------- ------ ---------- ------ --------------------------------- ----
10 CLERK 7934 MILLER 1300 grouped by deptno,job,empno,ename 0
10 CLERK 1300 grouped by deptno,job 0
10 MANAGER 7782 CLARK 2450 grouped by deptno,job,empno,ename 0
10 MANAGER 2450 grouped by deptno,job 0
10 PRESIDENT 7839 KING 5000 grouped by deptno,job,empno,ename 0
10 PRESIDENT 5000 grouped by deptno,job 0
10 8750 grouped by deptno 1
10 8750 grouped by deptno 0
20 ANALYST 7788 SCOTT 3000 grouped by deptno,job,empno,ename 0
20 ANALYST 7902 FORD 3000 grouped by deptno,job,empno,ename 0
20 ANALYST 6000 grouped by deptno,job 0
20 CLERK 7369 SMITH 800 grouped by deptno,job,empno,ename 0
20 CLERK 7876 ADAMS 1100 grouped by deptno,job,empno,ename 0
20 CLERK 1900 grouped by deptno,job 0
20 MANAGER 7566 JONES 2975 grouped by deptno,job,empno,ename 0
20 MANAGER 2975 grouped by deptno,job 0
20 10875 grouped by deptno 1
20 10875 grouped by deptno 0
30 CLERK 7900 JAMES 950 grouped by deptno,job,empno,ename 0
30 CLERK 950 grouped by deptno,job 0
30 MANAGER 7698 BLAKE 2850 grouped by deptno,job,empno,ename 0
30 MANAGER 2850 grouped by deptno,job 0
30 SALESMAN 7499 ALLEN 1600 grouped by deptno,job,empno,ename 0
30 SALESMAN 7521 WARD 1250 grouped by deptno,job,empno,ename 0
30 SALESMAN 7654 MARTIN 1250 grouped by deptno,job,empno,ename 0
30 SALESMAN 7844 TURNER 1500 grouped by deptno,job,empno,ename 0
30 SALESMAN 5600 grouped by deptno,job 0
30 9400 grouped by deptno 0
30 9400 grouped by deptno 1
29025 grouped by () 2
29025 grouped by () 3
29025 grouped by () 4
29025 grouped by () 0
29025 grouped by () 1
34 rijen zijn geselecteerd.
Here I have duplicated the empty grouping set 4 times and you see the outcome of the GROUP_ID() function for those five sets ranging from 0 to 4. This is exactly what is needed here, so now the initial query is quite easy, like this:
rwijk@ORA11GR1> select deptno
2 , job
3 , empno
4 , ename
5 , sum(sal) sumsal
6 , case grouping_id(deptno,job,empno)
7 when 0 then 'grouped by deptno,job,empno,ename'
8 when 1 then 'grouped by deptno,job'
9 when 3 then 'grouped by deptno, grouping set ' || to_char(3+group_id())
10 when 7 then 'grouped by (), grouping set ' || to_char(5+group_id())
11 end gr_text
12 from emp
13 group by grouping sets
14 ( (deptno,job,empno,ename)
15 , (deptno,job)
16 , deptno
17 , deptno
18 , ()
19 , ()
20 )
21 order by deptno
22 , job
23 , empno
24 /
DEPTNO JOB EMPNO ENAME SUMSAL GR_TEXT
------ --------- ------ ---------- ------ ---------------------------------
10 CLERK 7934 MILLER 1300 grouped by deptno,job,empno,ename
10 CLERK 1300 grouped by deptno,job
10 MANAGER 7782 CLARK 2450 grouped by deptno,job,empno,ename
10 MANAGER 2450 grouped by deptno,job
10 PRESIDENT 7839 KING 5000 grouped by deptno,job,empno,ename
10 PRESIDENT 5000 grouped by deptno,job
10 8750 grouped by deptno, grouping set 4
10 8750 grouped by deptno, grouping set 3
20 ANALYST 7788 SCOTT 3000 grouped by deptno,job,empno,ename
20 ANALYST 7902 FORD 3000 grouped by deptno,job,empno,ename
20 ANALYST 6000 grouped by deptno,job
20 CLERK 7369 SMITH 800 grouped by deptno,job,empno,ename
20 CLERK 7876 ADAMS 1100 grouped by deptno,job,empno,ename
20 CLERK 1900 grouped by deptno,job
20 MANAGER 7566 JONES 2975 grouped by deptno,job,empno,ename
20 MANAGER 2975 grouped by deptno,job
20 10875 grouped by deptno, grouping set 3
20 10875 grouped by deptno, grouping set 4
30 CLERK 7900 JAMES 950 grouped by deptno,job,empno,ename
30 CLERK 950 grouped by deptno,job
30 MANAGER 7698 BLAKE 2850 grouped by deptno,job,empno,ename
30 MANAGER 2850 grouped by deptno,job
30 SALESMAN 7499 ALLEN 1600 grouped by deptno,job,empno,ename
30 SALESMAN 7521 WARD 1250 grouped by deptno,job,empno,ename
30 SALESMAN 7654 MARTIN 1250 grouped by deptno,job,empno,ename
30 SALESMAN 7844 TURNER 1500 grouped by deptno,job,empno,ename
30 SALESMAN 5600 grouped by deptno,job
30 9400 grouped by deptno, grouping set 3
30 9400 grouped by deptno, grouping set 4
29025 grouped by (), grouping set 5
29025 grouped by (), grouping set 6
31 rijen zijn geselecteerd.
On comparing SQL grouping sets and MapReduce.
ReplyDeleteSee: http://www.data-miners.com/blog/2008/01/mapreduce-and-sql-aggregations.html and http://www.data-miners.com/blog/ .
RC, The Netherlands