skip to Main Content

I can do two select queries:

  1. select the top row customer_id
  2. select all rows that match the above customer_id

But instead, I want to know if I can do the above in a single query?

I know it’s possible using a subquery as in

SELECT * FROM customers
WHERE cust_id=(SELECT cust_id FROM customers LIMIT 1)

Is this an efficient way, or there is something better?

The requirement is to archive records that belong to a random cust_id or it might be correct for it to be the oldest cust_id (using aws lambda function in Python which runs periodically). Hence, I would like to fetch a cust_id and every other row with the same cust_id in a single transaction.

Using Postgres 10.5, with the DB table definition as below.

id BIGINT PRIMARY KEY,
cust_id VARCHAR(100) NOT NULL,
created TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
cust_data JSONB

Sample input

|id|cust_id|     cust_data          |     created        |
|--|-------|------------------------|--------------------|
|8 |cust9  |{"a": {"b": "c"}}       |25/11/2022  01:05:39|
|25|cust1  |{"x": "y"}              |05/12/2022  14:59:21|
|40|cust9  |{"d": {"b": {"c": "z"}}}|07/12/2022  11:29:14|
|87|cust2  |{"r": {"s": "t"}}       |13/12/2022  21:10:18|
|99|cust1  |{"p": "q"}              |20/12/2022  14:59:21|

Expected output

|id|cust_id|     cust_data          |     created        |
|--|-------|------------------------|--------------------|
|8 |cust9  |{"a": {"b": "c"}}       |25/11/2022  01:05:39|
|40|cust9  |{"d": {"b": {"c": "z"}}}|07/12/2022  11:29:14|

OR

Expected output

|id|cust_id|     cust_data          |     created        |
|--|-------|------------------------|--------------------|
|25|cust1  |{"x": "y"}              |05/12/2022  14:59:21|
|99|cust1  |{"p": "q"}              |20/12/2022  14:59:21|

OR

Expected output

|id|cust_id|     cust_data          |     created        |
|--|-------|------------------------|--------------------|
|87|cust2  |{"r": {"s": "t"}}       |13/12/2022  21:10:18|

During one query, we only want one of these expected outputs. Also need to mention that the created here is not the actual timestamp when this entry was created, as the entries made into this table are copied from another. So I think we can’t decide which is the cust_id that has got the oldest entry.

2

Answers


  1. What you are doing is correct.

    Just make sure to use ORDER BY to ensure that the cust_id is not random.

    SELECT * FROM customers
    WHERE cust_id=(SELECT cust_id FROM customers ORDER BY DATE LIMIT 1)
    
    Login or Signup to reply.
  2. To get all rows for a single, arbitrary customer in a single query, WITH TIES in Postgres 13 or later should be most efficient:

    SELECT *
    FROM   customers
    ORDER  BY cust_id
    FETCH  FIRST 1 ROWS WITH TIES;
    

    If the table is big, support this query with an index on (cust_id).

    See:

    Postgres 10 has reached EOL in 2022, so you need to upgrade ASAP anyway.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search