I am making a MySQL query of a table with thousands of records. What I’m really trying to do is find the next and previous rows that surround a particular ID. The issue is that when sorting the table in a specific way, there is no correlation between IDs (I can’t just search for id > $current_id LIMIT 1, for example, because the needed ID in the next row might or might not actually be higher. Here is an example:
ID Name Date
4 Fred 1999-01-04
6 Bill 2002-04-02
7 John 2002-04-02
3 Sara 2002-04-02
24 Beth 2007-09-18
1 Dawn 2007-09-18
Say I know I want the records that come directly before and after John (ID = 7). In this case, the ID of the record after that row is actually a lower number. The table is sorted by date first and then by name, but there are many entires with the same date – so I can’t just look for the next date, either. What is the best approach to find, in this case, the row before and (separately) the row after ID 7?
Thank you for any help.
3
Answers
You can use a window function called ROW_NUM in this way.
ROW_NUM() OVER()
. This will number every row in the table consecutively. Now you search for your I’d and you also get the Row number. For example, you search for ID=7 and you get row number 35. Now you can search for row number from 34 to 36 to get rows below and above the one with I’d 7.This is what comes to mind:
Using the row number window function, you can compare two view of the table with
id = 7
and get the row you need. You can change the condition in the subquery to suit your needs, e.g.,p2.name = 'John' and p2.date = '2002-04-02'
.Here’s a dbfiddle demonstrating: https://www.db-fiddle.com/f/mpQBcijLFRWBBUcWa3UcFY/2
Alternately, you can simplify the syntax a bit and avoid the redundancy using a CTE like this:
As others have suggested you can use window functions for this, but I would use LEAD() and LAG() instead of ROW_NUMBER().
With
thousands of records
(very small) this should be very fast but if you expect it to grow to hundreds of thousands, or even millions of rows you should try to limit the amount of work being done in the inner query. Sorting millions of rows and assigning prev and next values to all of them, just to use one row would be excessive.Assuming your example of John (ID = 7) you could use the
Date
to constrain the inner query. If the adjacent records would always be within one month then you could do something like –Without knowing more detail about the distribution of your data, I am only guessing but hopefully you get the idea.