Wednesday, October 13, 2010

Aggregating over zero rows

In my presentation and paper about grouping last year, I mentioned that an aggregation query without a group by clause is the same as grouping by the empty grouping set. So this query:

SQL> select count(*)
2 from emp
3 /

COUNT(*)
----------
14

1 rij is geselecteerd.

is the same as:

SQL> select count(*)
2 from emp
3 group by ()
4 /

COUNT(*)
----------
14

1 rij is geselecteerd.

However, this is not always true (also known as "false"). It is not true when you aggregate over zero rows. If I repeat the query above, but filter on deptno = 40 (there are no employees in that department), then there is an odd difference:

SQL> select count(*)
2 from emp
3 where deptno = 40
4 /

COUNT(*)
----------
0

1 rij is geselecteerd.

SQL> select count(*)
2 from emp
3 where deptno = 40
4 group by ()
5 /

Er zijn geen rijen geselecteerd.

It's not that the empty grouping set is weird or anything, it's the fact that a group by clause is present. The documentation states:

"In a query containing a GROUP BY clause, the elements of the select list can be aggregate functions, GROUP BY expressions, constants, or expressions involving one of these. Oracle applies the aggregate functions to each group of rows and returns a single result row for each group.

If you omit the GROUP BY clause, then Oracle applies aggregate functions in the select list to all the rows in the queried table or view."


So it seems that the internal implementation looks something like this in pseudocode:

if not "group by clause present"
then
return exactly one row
else
return as much rows as there are groups
end if


Even if we are grouping by a regular column instead of the empty grouping set, we get a "No rows selected":

SQL> select deptno
2 , count(*)
3 from emp
4 where deptno = 40
5 group by deptno
6 /

Er zijn geen rijen geselecteerd.

We can also see the same behaviour when using the group by extensions, like rollup and cube. They typically show a grand total, like in:

SQL> select deptno
2 , count(*)
3 from emp
4 group by rollup(deptno)
5 /

DEPTNO COUNT(*)
---------- ----------
10 3
20 5
30 6
14

4 rijen zijn geselecteerd.

But when there are zero rows, not only the regular group rows disappear, but also the grand total:

SQL> select deptno
2 , count(*)
3 from emp
4 where deptno = 40
5 group by rollup(deptno)
6 /

Er zijn geen rijen geselecteerd.

Which is now less of a surprise, because this query is equivalent to:

SQL> select deptno
2 , count(*)
3 from emp
4 where deptno = 40
5 group by grouping sets (deptno,())
6 /

Er zijn geen rijen geselecteerd.

Which equals:

SQL> select deptno
2 , count(*)
3 from emp
4 where deptno = 40
5 group by deptno
6 union all
7 select null
8 , count(*)
9 from emp
10 where deptno = 40
11 group by ()
12 /

Er zijn geen rijen geselecteerd.

And then we recognize the queries listed earlier.

So the behaviour is consistent, but it surprised me, even though it is clearly documented.