How to fix 'must be an aggregate expression or appear in GROUP BY clause' with 'as' clause in Presto? - PhpOut

YannStoneman
November 17, 2022
248 views
2 votes
3 Answers

This question is similar to another one, but I’m providing a simpler example. The other query was too advanced to make sense to me.

Sample (fake) data

id	gender	kg
4f5a07ca-02e0-8981-3c30-4d9924a169a3	male	103
4f5a07ca-02e0-8981-3c30-4d9924a169a3	male	85
4f5a07ca-02e0-8981-3c30-4d9924a169a3	male	469
e05d54e9-8292-b26c-5618-8a3712b4fc44	female	33

Desired outcome

id	gender	kg
4f5a07ca-02e0-8981-3c30-4d9924a169a3	male	85
e05d54e9-8292-b26c-5618-8a3712b4fc44	female	33
e05d54e9-8292-b26c-5618-8a3712b4fc44	female	36
01f8bbfd-cfc6-3b97-8bc1-8da6f0b4a9a8	female	92

(Goal is having the same id only show up once, and just picking the first match, given an ordering by kg)

QUERY:

SELECT 
    p.id,
    p.gender,
    p.kg
FROM patient p
ORDER BY p.kg
GROUP BY 1

Error:

‘p.gender’ must be an aggregate expression or appear in GROUP BY clause

And if I change it to GROUP BY 1, 2, I get the same error, one column over:

‘p.kg’ must be an aggregate expression or appear in GROUP BY clause

How can I solve this?

Tags: amazon-athena amazon-web-services ansi-sql presto sql

Answers

- Gerballi
- November 17, 2022 at 4:04 am
- 0 votes
0
You can use an aggregate function to get min/max/avg kg. Otherwise you need to replace p.kg with a subquery returning a single version, something like:
```
SELECT 
    p.id,
    p.gender,
    (select top 1 p.kg from patient where patient.id=p.id)
FROM patient p
ORDER BY p.id
```
Login or Signup to reply.

- HaleemurAli
- November 17, 2022 at 4:31 am
- 0 votes
0
what you need here is a window function first_value combined with distinct, rather than group-by / aggregate.

the window function + distinct pattern can sometimes be used to replicate aggregate query
```
select distinct 
  id
, first_value(gender) over (partition by id order by kg) gender
, first_value(kg) over (partition by id order by kg) kg
from patient
```
Login or Signup to reply.

- GuruStron
- November 17, 2022 at 8:29 pm
- 0 votes
0
"Ordering by kg" combined with selecting only one value sounds awfully like min/max aggregate function. In this case min (for gender you can use min_by or arbitrary):
```
SELECT 
    p.id,
    min_by(p.gender, p.kg),
    min(p.kg)
FROM patient p
GROUP BY p.id
```
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.