skip to Main Content

I am using PostgreSQL DB and have use case where I need to return all the rows from table whose column value is superset of same column value of other preceding rows.

Example – the table has following rows:

-------------------------------------------
id   |   name       |   datetime
-------------------------------------------
1    |   N          |   2024-08-17 04:42:57 
2    |   NEW        |   2024-08-17 04:42:58    
3    |   NEW YOR    |   2024-08-17 04:42:58    
4    |   NEW YORK   |   2024-08-17 04:42:59    
5    |   W          |   2024-08-17 04:56:07    
6    |   WA         |   2024-08-17 04:56:07    
7    |   WASHI      |   2024-08-17 04:56:07    
8    |   WASHINGT   |   2024-08-17 04:56:08   
9    |   WASHINGTON |   2024-08-17 04:56:08   
10   |   NEW Y      |   2024-08-17 05:12:18   
11   |   NEW YORK   |   2024-08-17 05:12:18

I need SQL which can return following result against above table:

-------------------------------------------
id   |   name       |   datetime
-------------------------------------------
4    |   NEW YORK   |   2024-08-17 04:42:59    
9    |   WASHINGTON |   2024-08-17 04:56:08   
11   |   NEW YORK   |   2024-08-17 05:12:18

2

Answers


  1. Your requirement can be translated to:

    "Take rows where the name in the next row does not start with the name of the current row."

    So:

    SELECT id, name, datetime 
    FROM  (
       SELECT *, lead(name, 1, '') OVER (ORDER BY id) AS next_name
       FROM   tbl
       ORDER  BY id
       ) sub
    WHERE  NOT (next_name ^@ name);
    

    fiddle

    Assuming rows are indeed sorted by id as you make it seem, and involved columns are defined NOT NULL.

    Two steps:

    1. In subquery sub, take the name of the next row according to your sort order with the window function lead(), let’s call it next_name.
      To cover the corner case of the last row, default to the empty string (''), when there is no next row.

    2. In the outer SELECT, filter rows where next_name does not start with name. Should be exactly what you ask for.

    ^@ is the "starts with" operator added with Postgres 11. See:

    (Substitute with LIKE or ~ in older versions.)

    Other possible (more verbose) solutions:

    • a recursive CTEs
    • a procedural solution in a PL/pgSQL function.
    Login or Signup to reply.
  2. If the table is not sorted according to name, this query can do it, as it does not rely on window. It is fairly readable, though not very fast, as it needs two subqueries per row:

    SELECT * FROM t
    WHERE EXISTS (
      SELECT * FROM t t2
      WHERE t.name ^@ t2.name AND t.name != t2.name
    )
    AND NOT EXISTS (
      SELECT * FROM t t3
      WHERE t3.name ^@ t.name AND t3.name != t.name
    );
    

    Select all rows where there exists a smaller name elsewhere (starting with the same string), and there is no larger name elsewhere (starting with the same string).

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search