skip to Main Content

How to order by timestamp and group based on score

  1. Entities starting with a prefix (say a1 are similar), but we need to
    pickup the one with the highest score, in this case a1-1
  2. For a2 it should pickup a2-1, since it has a score of 0.6
  3. For a3, it should be a3-0
  4. For a4, it should be a4-1, since its recent
  5. Assume time is mentioned based on creation time ascending (i.e t1, t2 …)

entities table contains the following

1. a1-0, x0, 0.8, t1
2. a2-0, x1, 0.5, t2
3. a2-1, x2, 0.6, t3
4. a3-0, x3, 0.8, t4
5. a2-2, x4, 0.7, t5
6. a1-1, x5, 0.9, t6
7. a4-0, x6, 0.6, t7
8. a4-1, x7, 0.6, t8
9. a1-2, x8, 0.85, t9

This is the output I expect, what SQL query should I use

  1. a4-1
  2. a3-0
  3. a2-1
  4. a1-1

2

Answers


  1. User Postgresql’s brilliant distinct on with custom order.

    select distinct on (split_part(ent_id, '-', 1)) ent_id
    from the_table
    order by (split_part(ent_id, '-', 1)), score desc, ts;
    
    ent_id
    a1-1
    a2-2
    a3-0
    a4-0

    Pls. note that ‘a2-2’ has higher score than ‘a2-1’.
    Live demo here

    Login or Signup to reply.
  2. From the doc:

    SELECT DISTINCT ON ( expression [, ...] ) keeps only the first row of each set of rows where the given expressions evaluate to equal. The DISTINCT ON expressions are interpreted using the same rules as for ORDER BY (see above). Note that the “first row” of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first.

    You can combine that with string functions to establish sets based on the prefix. Demo:

    SELECT DISTINCT ON( (string_to_array(code,'-'))[1] )
           code
    FROM entities
    ORDER BY (string_to_array(code,'-'))[1] DESC, 
             score         DESC, 
             creation_time DESC;
    
    code
    a4-1
    a3-0
    a2-2 (this had score of 0.7, which is more than a2-1’s 0.6)
    a1-1
    • (string_to_array(code,'-')) cuts up the code at each - producing an array, then [1] takes its first element, which will be the prefix.
    • ORDER BY dictates that the prefix is primarily responsible for the position. You wanted highest prefix first so you need DESCending order. DISTINCT ON will discard all but one record for each prefix.
    • Out of each set of records sharing a prefix, you wanted the top score, hence the second ORDER BY expression is score DESC.
    • The latest timestamp will have the highest value, so to pick a4-1 over a4-0 based on the more recent creation time, you need creation_time DESC.

    It’s good to normalise your structure. You’re holding multiple pieces of information as a single code, which forces you to use string functions in order to separate them and process individually:

    • If that’s how you’re receiving the data, you can add generated columns to parse the code automatically or set up a view on top of the table.
    • If you have control over the construction of these records, save each underlying piece of information into a separate column. Using a2-0 as an example, put the primary group identifier a into one column, secondary 2 into another, order number 0 into another. You might prefer to just split into prefix and suffix, but the extra granularity might be worth it if you consider that text sorts a800 ahead, as a lower value than a9 (demo).
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search