skip to Main Content

I have a MySQL table having "datetime" columns begintime and endtime:

+---------------------+---------------------+
| begintime           | endtime             |
+---------------------+---------------------+
| 2024-05-22 10:13:23 | 2024-05-31 13:37:34 |
| 2024-05-30 17:03:21 | 2024-05-31 16:01:25 |
| 2024-05-30 17:03:21 | 2024-05-31 16:01:25 |
| 2024-05-30 17:03:21 | 2024-05-31 16:01:25 |
| 2024-05-31 15:00:00 | 2024-05-31 15:00:03 |
| 2024-05-31 15:01:32 | 2024-05-31 16:01:26 |
+---------------------+---------------------+

This table contains the rows where begintime is the same as in some row and endtime is less than in that row. For example:

| 2024-05-22 10:13:23 | 2024-05-31 12:02:18 |

Here begintime is the same as in the first row and endtime is less than in that row.

How can I filter these rows out of the table using MySQL or maybe Python’s pandas?

3

Answers


  1. The easiest way would be to GROUP BY the begintime and getting the maximum endtime:

    SELECT begintime, MAX(endtime) AS endtime
    FROM your_table
    GROUP BY begintime
    
    Login or Signup to reply.
  2. You can do it by simply using a self-join in MySQL

    SELECT table1.*
    FROM your_table table1
    JOIN your_table table2
    ON table1.begintime = table2.begintime AND table1.endtime < table2.endtime;
    
    Login or Signup to reply.
  3. you can use a self-join in SQL. Here’s how you can do it:

    DELETE t1
    FROM your_table t1
    INNER JOIN your_table t2 ON t1.begintime = t2.begintime
    WHERE t1.endtime < t2.endtime;
    

    If you want to achieve the same using Python’s pandas:

    import pandas as pd
    
    # Assuming you have fetched your data into a DataFrame named df
    # Convert 'begintime' and 'endtime' columns to datetime dtype if they are not already
    df['begintime'] = pd.to_datetime(df['begintime'])
    df['endtime'] = pd.to_datetime(df['endtime'])
    
    # Sort the DataFrame by 'begintime' and 'endtime' columns
    df.sort_values(by=['begintime', 'endtime'], inplace=True)
    
    # Group by 'begintime' and filter out rows where 'endtime' is less than the maximum 'endtime' for each group
    filtered_df = df.groupby('begintime').apply(lambda x: x[x['endtime'] == x['endtime'].max()])
    
    # Reset index to remove the groupby index
    filtered_df.reset_index(drop=True, inplace=True)
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search