I can do two select queries:
- select the top row customer_id
- select all rows that match the above customer_id
But instead, I want to know if I can do the above in a single query?
I know it’s possible using a subquery as in
SELECT * FROM customers
WHERE cust_id=(SELECT cust_id FROM customers LIMIT 1)
Is this an efficient way, or there is something better?
The requirement is to archive records that belong to a random cust_id or it might be correct for it to be the oldest cust_id (using aws lambda function in Python which runs periodically). Hence, I would like to fetch a cust_id and every other row with the same cust_id in a single transaction.
Using Postgres 10.5, with the DB table definition as below.
id BIGINT PRIMARY KEY,
cust_id VARCHAR(100) NOT NULL,
created TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
cust_data JSONB
Sample input
|id|cust_id| cust_data | created |
|--|-------|------------------------|--------------------|
|8 |cust9 |{"a": {"b": "c"}} |25/11/2022 01:05:39|
|25|cust1 |{"x": "y"} |05/12/2022 14:59:21|
|40|cust9 |{"d": {"b": {"c": "z"}}}|07/12/2022 11:29:14|
|87|cust2 |{"r": {"s": "t"}} |13/12/2022 21:10:18|
|99|cust1 |{"p": "q"} |20/12/2022 14:59:21|
Expected output
|id|cust_id| cust_data | created |
|--|-------|------------------------|--------------------|
|8 |cust9 |{"a": {"b": "c"}} |25/11/2022 01:05:39|
|40|cust9 |{"d": {"b": {"c": "z"}}}|07/12/2022 11:29:14|
OR
Expected output
|id|cust_id| cust_data | created |
|--|-------|------------------------|--------------------|
|25|cust1 |{"x": "y"} |05/12/2022 14:59:21|
|99|cust1 |{"p": "q"} |20/12/2022 14:59:21|
OR
Expected output
|id|cust_id| cust_data | created |
|--|-------|------------------------|--------------------|
|87|cust2 |{"r": {"s": "t"}} |13/12/2022 21:10:18|
During one query, we only want one of these expected outputs. Also need to mention that the created here is not the actual timestamp when this entry was created, as the entries made into this table are copied from another. So I think we can’t decide which is the cust_id that has got the oldest entry.
2
Answers
What you are doing is correct.
Just make sure to use ORDER BY to ensure that the cust_id is not random.
To get all rows for a single, arbitrary customer in a single query,
WITH TIES
in Postgres 13 or later should be most efficient:If the table is big, support this query with an index on
(cust_id)
.See:
Postgres 10 has reached EOL in 2022, so you need to upgrade ASAP anyway.