I was solving some SQL tasks and got confused with the following thing:
CREATE TABLE data (
len INT
);
INSERT INTO data (len)
VALUES
(NULL),
(100),
(200),
(300),
(400);
suppose we’ve created such a table and we wanna count all values that are less than 200. I am using this online compiler to do it.
my first attempt gives 1 as expected
SELECT COUNT(*) FILTER (WHERE len < 200)
FROM data
then I thought whether I can simplify it and just do
SELECT COUNT(len < 200) FROM data
but it gives 4 as an answer.
So my question is: can we use logic expressions inside of aggregate functions? If the answer is NO, why?
2
Answers
COUNT(expression)
computes the number of rows for whichexpression
is notNULL
.COUNT(*)
computes the total number of rows because*
is the row value, which is neverNULL
for any row, even if every column in a row is NULL. The<
operator can return one of three results:TRUE
,FALSE
, orNULL
. Keep in mind thatNULL
is not a value, it is a state. The expressionCOUNT(len < 200)
only returnsNULL
whenlen
isNULL
; therefore, any valuedlen
will contribute to the count.COUNT(NULL)
will always return 0 regardless of the number of rows.count(*)
counts all input rows regardless of their contents. Period. That’s according to the SQL standard, and does not depend on the meaning of the symbol*
in other contexts.Postgres has a separate (faster) implementation for
count(*)
. The manual:The definition of the variant
count("any")
also answers why your 2nd query counts all rows with not-nulllen
.To actually simplify your query – and make it faster, or dramatically faster for big tables and selective criteria matching an index:
See: