Friday, January 30, 2009

COUNT DISTINCT

This morning I saw an e-mail with the title "SQL puzzle?" in my inbox. That's a good title to make me read that one first. The poster was stunned by the results of the following two queries:

SQL> SELECT COUNT(DISTINCT bdl.nrl_id)
2 FROM a_table bdl
3 WHERE bdl.periode >= to_date('20081023092701','yyyymmddhh24miss')
4 ;

COUNT(DISTINCTBDL.NRL_ID)
-------------------------
61111

1 rij is geselecteerd.

SQL> SELECT COUNT(*)
2 FROM (SELECT DISTINCT bdl.nrl_id
3 FROM a_table bdl
4 WHERE bdl.periode >= to_date('20081023092701','yyyymmddhh24miss')
5 )
6 ;

COUNT(*)
----------
61112

1 rij is geselecteerd.


Of course it's possible that a record was inserted in the meantime, but that was not the case here.

The explanation was quite simple. See this example:

rwijk@ORA11GR1> select comm from emp
2 /

COMM
----------

300
500

1400




0





14 rijen zijn geselecteerd.

rwijk@ORA11GR1> select distinct comm from emp
2 /

COMM
----------

1400
500
300
0

5 rijen zijn geselecteerd.

rwijk@ORA11GR1> select count(*) from (select distinct comm from emp)
2 /

COUNT(*)
----------
5

1 rij is geselecteerd.

rwijk@ORA11GR1> select count(distinct comm) from emp
2 /

COUNT(DISTINCTCOMM)
-------------------
4

1 rij is geselecteerd.


Again those pesky null values. From the documentation:
"All aggregate functions except COUNT(*), GROUPING, and GROUPING_ID ignore nulls."
A distinct doesn't ignore nulls, and the count(*) on top of it doesn't either.

At first glance, a "select count(distinct ...)" appears equal to a "select count(*) from (select distinct ...)", but as shown: it is not.



UPDATE


I forgot to mention that a "select count(comm) from (select distinct comm...)" [note: count(comm) instead of count(*)] is equal to a "select count(distinct comm) ...":

rwijk@ORA11GR1> select count(comm) from (select distinct comm from emp);

COUNT(COMM)
-----------
4

1 rij is geselecteerd.

2 comments: