skip to Main Content

I have in my postgres database, the following table:

CREATE TABLE trips(id int, lat_init double precision, lon_init double precision,
lat_end double precision, lon_end double precision);

INSERT INTO trips(id, lat_init, lon_init, lat_end, lon_end)
VALUES (1,53.348060,-1.504068,53.360690,-1.457364),
(2,53.427651,-1.355329,53.380441,-1.536918),
(3,53.365660,-1.497837,53.363838,-1.388023),
(4,53.380349,-1.460028,53.429298,-1.359443),
(5,53.345526,-1.449195,53.407839,-1.501819);

So that (considering the example table values above):

SELECT * FROM  trips;
id  lat_init     lon_init    lat_end      lon_end
1   53.34806    -1.504068   53.36069    -1.457364
2   53.427651   -1.355329   53.380441   -1.536918
3   53.36566    -1.497837   53.363838   -1.388023
4   53.380349   -1.460028   53.429298   -1.359443
5   53.345526   -1.449195   53.407839   -1.501819

In SQL, I can filter rows that start or end outside the bounding box defined by the coordinates (row 2 and 4):

  • Upper Left: (53.42361, -1.57495),
  • Lower Right: (53.32037, -1.38063)

Using:

select t.*
from trips t
where lat_init >= 53.32037 and lat_init <= 53.42361 and
      lat_end >= 53.32037 and lat_end <= 53.42361 and
      lon_init >= -1.57495 and lon_init <= -1.38063 and
     lon_end >= -1.57495 and lon_end <= -1.38063;

id  lat_init     lon_init    lat_end     lon_end
1   53.34806    -1.504068   53.36069    -1.457364
3   53.36566    -1.497837   53.363838   -1.388023
5   53.345526   -1.449195   53.407839   -1.501819

I want to apply this filtering procedure in pandas. The trips table is exported to a csv file called my-trips.csv.

So that:

df = pd.read_csv('my-trips.csv')
df
    id   lat_init    lon_init    lat_end      lon_end
0   1   53.348060   -1.504068   53.360690   -1.457364
1   2   53.427651   -1.355329   53.380441   -1.536918
2   3   53.365660   -1.497837   53.363838   -1.388023
3   4   53.380349   -1.460028   53.429298   -1.359443
4   5   53.345526   -1.449195   53.407839   -1.501819

How then do I perform the filtering as I do in the SQL above?

2

Answers


  1. Option 1

    One way to do it is as follows

    conditions = [(df['lat_init'] >= 53.32037) & (df['lat_init'] <= 53.42361),
                    (df['lat_end'] >= 53.32037) & (df['lat_end'] <= 53.42361),
                    (df['lon_init'] >= -1.57495) & (df['lon_init'] <= -1.38063),
                    (df['lon_end'] >= -1.57495) & (df['lon_end'] <= -1.38063)]
    
    df_new = df[conditions[0] & conditions[1] & conditions[2] & conditions[3]]
    
    [Out]:
    
       id   lat_init  lon_init    lat_end   lon_end
    0   1  53.348060 -1.504068  53.360690 -1.457364
    2   3  53.365660 -1.497837  53.363838 -1.388023
    4   5  53.345526 -1.449195  53.407839 -1.501819
    

    A one-liner would look like the following

    df_new = df[(df['lat_init'] >= 53.32037) & (df['lat_init'] <= 53.42361) & (df['lat_end'] >= 53.32037) & (df['lat_end'] <= 53.42361) & (df['lon_init'] >= -1.57495) & (df['lon_init'] <= -1.38063) & (df['lon_end'] >= -1.57495) & (df['lon_end'] <= -1.38063)]
    
    [Out]:
    
       id   lat_init  lon_init    lat_end   lon_end
    0   1  53.348060 -1.504068  53.360690 -1.457364
    2   3  53.365660 -1.497837  53.363838 -1.388023
    4   5  53.345526 -1.449195  53.407839 -1.501819
    

    Option 2

    One can also use dataframe.query as follows

    df_new = df.query('lat_init >= 53.32037 and lat_init <= 53.42361 and lat_end >= 53.32037 and lat_end <= 53.42361 and lon_init >= -1.57495 and lon_init <= -1.38063 and lon_end >= -1.57495 and lon_end <= -1.38063')
    
    [Out]:
    
       id   lat_init  lon_init    lat_end   lon_end
    0   1  53.348060 -1.504068  53.360690 -1.457364
    2   3  53.365660 -1.497837  53.363838 -1.388023
    4   5  53.345526 -1.449195  53.407839 -1.501819
    
    Login or Signup to reply.
  2. I post the code below should work:

    import pandas as pd
    df = pd.read_clipboard()
    df = df[
        df['lat_init'].between(53.32037, 53.42361, inclusive='both') &
        df['lon_init'].between(-1.57495, -1.38063, inclusive='both') &
        df['lat_end'].between(53.32037, 53.42361, inclusive='both') &
        df['lon_end'].between(-1.57495, -1.38063, inclusive='both')
        ]
    print(df)
    
    
       id   lat_init  lon_init    lat_end   lon_end
    0   1  53.348060 -1.504068  53.360690 -1.457364
    2   3  53.365660 -1.497837  53.363838 -1.388023
    4   5  53.345526 -1.449195  53.407839 -1.501819
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search