skip to Main Content

I have a very flat table that is a collection of swimmer names, distances, events (strokes), and times(float). What I would like is to get the only fastest time for each combination of distance and stroke. To be clear, there are multiple distances and multiple strokes (50 free, 100 free, 200 free, 50 back, 100 back .. and so on)

The data looks like the following where the "bold" names would be the records I want returned.

NAME DISTANCE STROKE_ID TIME
Susane Smith 50 1 74.33
Cassey Putnam 50 1 107.24
Kelly Maine 50 1 118.08
Susane Smith 100 1 168.4
Charlotte Castle 200 1 476.46
Susane Smith 50 2 83.23
Charlotte Castle 50 2 104.83
Katy Cooper 50 2 167.49
Susane Smith 100 2 174.93
Charlotte Castle 100 2 219.29
JoAnne Cooper 100 2 355.74
Maddy Franks 200 2 377.15
JoAnne Cooper 200 2 733.77
Charlotte Castle 50 3 138.5
Charlotte Castle 100 3 349.2
Charlotte Castle 200 3 676.12

2

Answers


  1. I created a table exactly like yours, entered the data and fooled around and got to the following query:

    SELECT NAME, DISTANCE, STROKE_ID, TIME FROM (
        SELECT NAME, DISTANCE, STROKE_ID, TIME, ROW_NUBMER() OVER (
            PARTITION BY DISTANCE, STROKE_ID ORDER BY TIME
        ) AS rn FROM <table name here>)
    SUB WHERE rn = 1;
    

    Lets break this down step by step:
    The query exists of two parts, the outer query and the inner query. First we’ll look at the inner query.

    (SELECT NAME, DISTANCE, STROKE_ID, TIME, ROW_NUMBER() OVER (
        PARTITION BY DISTANCE, STROKE_ID ORDER BY TIME
    ) AS rn FROM <table name here>) SUB
    

    This query adds an extra column to the data called ‘rn‘. This column is based on the DISTANCE paired with the STROKE_ID. Each different combination of the two gets a new number. So the combinations might look like this:

    DISTANCE STROKE_ID nr
    50 1 1
    50 2 2
    100 1 3
    100 2 4

    etc. It then sorts these results ascending by time. Mind that the data I showed above is not the actual newly created table, it just goes to show what the ‘rn‘ column does. Then this new ‘table’ gets the alias SUB (short for subquery).
    Then the outer query:

    SELECT NAME, DISTANCE, STROKE_ID, TIME FROM SUB WHERE rn = 1;
    

    The outer query selects the NAME, DISTANCE, STROKE_ID and TIME from SUB where rn = 1. We choose 1 for rn because this is the row with the fastest time.

    Note that the word SUB is used once in the total query, and twice in my explanation. This is only so it reads more easily, the uppermost(total) query is what you would actually type in your terminal.

    Hope this is what you meant.

    [EDIT]
    This was my output by the way:

    MariaDB [swimmers]> SELECT NAME, DISTANCE, STROKE_ID, TIME FROM (
    SELECT NAME, DISTANCE, STROKE_ID, TIME, ROW_NUMBER() OVER (
    PARTITION BY DISTANCE, STROKE_ID ORDER BY TIME)
    AS rn FROM testing) SUB WHERE rn = 1;
    +------------------+----------+-----------+--------+
    | NAME             | DISTANCE | STROKE_ID | TIME   |
    +------------------+----------+-----------+--------+
    | Susane Smith     |       50 |         1 |  74.33 |
    | Susane Smith     |       50 |         2 |  83.23 |
    | Charlotte Castle |       50 |         3 |  138.5 |
    | Susane Smith     |      100 |         1 |  168.4 |
    | Susane Smith     |      100 |         2 | 174.93 |
    | Charlotte Castle |      100 |         3 |  349.2 |
    | Charlotte Castle |      200 |         1 | 476.46 |
    | Maddy Franks     |      200 |         2 | 377.15 |
    | Charlotte Castle |      200 |         3 | 676.12 |
    +------------------+----------+-----------+--------+
    9 rows in set (0.001 sec)
    
    Login or Signup to reply.
  2. DISTINCT ON makes this very simple:

    SELECT DISTINCT ON (stroke_id, distance) *
    FROM   tbl
    ORDER  BY stroke_id, distance, time;
    

    Detailed explanation:

    Subtle detail: you seem to want to order by stroke_id first, hence the sort order in ORDER BY.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search