I am using PostgreSQL
If I want to get customer.first_name, customer.last_name or any other column in addition to customer_id and sum(total), I am getting error as shown below:
select i.customer_id, cu.first_name, cu.last_name, sum(total) as amount from invoice i
join customer cu
on i.customer_id = cu.customer_id
group by i.customer_id
order by amount desc
ERROR: column "i.customer_id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: select i.customer_id, cu.first_name, cu.last_name, sum(total...
^
SQL state: 42803
Character: 8
So, I was just experimenting and found that when I wrote the query in the following query, it’s working fine:
select cu.customer_id, cu.first_name, cu.last_name, sum(total) as amount from invoice i
join customer cu
on i.customer_id = cu.customer_id
group by cu.customer_id
order by amount desc
Why is it so?
Even when there is only one table and I select other columns also which are not used to create group by, I get the same error.
Then how can I select other columns also while using group by?
2
Answers
You are trying to select additional columns (cu.first_name, cu.last_name) along with the aggregated column (sum(total)) without including them in the GROUP BY clause or using an aggregate function on them. In standard SQL, when you use GROUP BY, you can only include columns in the SELECT clause that are either part of the GROUP BY clause or are being used in aggregate functions like SUM, COUNT, AVG, etc. This is because when you group data, the database needs to know how to aggregate the non-grouped columns.
cu.customer_id is an identifier and is enough, for postgres sql to know how to group data
in mysql there is a function that allow you to write your query without group by by disabling ONLY_FULL_GROUP_BY, but in postgresql there is not such behavior
The second query works because all the columns in
customer
table are functionally dependent oncustomer_id(PK)
(even if you haven’t posted your table description).In simple words , every record is unique in
customer_id
column and it doesn’t matter how many columns you add from thecustomer
table because every row would be returned.Second query would fail if
customer_id
isn’t declared as primary key,See this example for more info.