skip to Main Content

I’ve got data currently within csv, with a column called "journeyroute." This column has the following data [truncated due to size]:

{"type": "FeatureCollection", "features": [{"type": "Feature", "geometry": {"type": "Point", "coordinates": [-4.095772, 50.409393]}, "properties": {"name": "start"}}, {"type": "Feature", "geometry": null, "properties": {"name": "end"}}, {"type": "Feature", "geometry": {"type": "LineString", "coordinates": [[-4.095772, 50.409393], [-4.095781, 50.409397], [-4.095792, 50.409401], [-4.095965, 50.40971], [-4.096064, 50.410069], [-4.09597, 50.410397]]}, "properties": {"distance": 4027.4, "name": "Raw", "times": [1690900467000, 1690900520000, 1690900522000, 1690900539000, 1690900550000, 1690900569000], "duration": 4923.0}}]}

There are 5,000 rows of data. What I’m trying to do is extract out the LineString data to use within R, but I’m getting stuck. Can anyone help please?

I’ve tried converting to JSON and then unnesting, but comes up with an error (code adapted from other answers using Google Earth Engine):

new_df <- df %>%
    mutate(geo = map(Journey.Route, ~ jsonlite::fromJSON(.))) %>%
    as.data.frame() %>%
    unnest(geo) %>%
    filter(geo != "FeatureCollection") %>%
    mutate(coord = rep(c("x", "y"))) %>%
    pivot_wider(names_from = coord, values_from = coordinates)

Error in `mutate()`:
ℹ In argument: `coord = rep(c("x", "y"))`.
Caused by error:
! `coord` must be size 5000 or 1, not 2.
Run `rlang::last_trace()` to see where the error occurred.

Expecting a sf geometry column of LineString coordinates.

2

Answers


  1. I suggest you don’t convert the geo to a frame, since it’s a rather nested list. A quick function that extracts the "LineString" component:

    getpart <- function(x, type = "LineString", what = "coordinates") {
      out <- 
        if ("geometry" %in% names(x) && "type" %in% names(x[["geometry"]]) &&
              identical(x[["geometry"]][["type"]], type)) {
          x[["geometry"]][["coordinates"]]
        } else if ("features" %in% names(x)) {
          lapply(x[["features"]], getpart)
        }
      if (is.matrix(out)) {
        out <- as.data.frame(out)
      } else if (is.list(out)) {
        out <- Filter(length, out)
        if (all(sapply(out, inherits, "data.frame")) &&
              length(unique(sapply(out, ncol))) == 1L) {
          out <- do.call(rbind, out)
        }
      }
      out
    }
    

    With this, testing on a singleton, we see

    json <- '{"type": "FeatureCollection", "features": [{"type": "Feature", "geometry": {"type": "Point", "coordinates": [-4.095772, 50.409393]}, "properties": {"name": "start"}}, {"type": "Feature", "geometry": null, "properties": {"name": "end"}}, {"type": "Feature", "geometry": {"type": "LineString", "coordinates": [[-4.095772, 50.409393], [-4.095781, 50.409397], [-4.095792, 50.409401], [-4.095965, 50.40971], [-4.096064, 50.410069], [-4.09597, 50.410397]]}, "properties": {"distance": 4027.4, "name": "Raw", "times": [1690900467000, 1690900520000, 1690900522000, 1690900539000, 1690900550000, 1690900569000], "duration": 4923.0}}]}'
    getpart(jsonlite::fromJSON(json, simplifyDataFrame = FALSE))
    #          V1       V2
    # 1 -4.095772 50.40939
    # 2 -4.095781 50.40940
    # 3 -4.095792 50.40940
    # 4 -4.095965 50.40971
    # 5 -4.096064 50.41007
    # 6 -4.095970 50.41040
    

    If your frame is well structured, then we can do something like:

    df <- tibble(json = rep(json, 3)) %>%
      mutate(row = row_number())
    df
    # # A tibble: 3 × 2
    #   json                                                                                                           row
    #   <chr>                                                                                                        <int>
    # 1 "{"type": "FeatureCollection", "features": [{"type": "Feature", "geometry": {"type": "Point…     1
    # 2 "{"type": "FeatureCollection", "features": [{"type": "Feature", "geometry": {"type": "Point…     2
    # 3 "{"type": "FeatureCollection", "features": [{"type": "Feature", "geometry": {"type": "Point…     3
    df %>%
      mutate(geo = lapply(json, function(js) getpart(jsonlite::fromJSON(js, simplifyDataFrame = FALSE)))) %>%
      select(-json) %>%
      unnest(geo)
    # # A tibble: 18 × 3
    #      row    V1    V2
    #    <int> <dbl> <dbl>
    #  1     1 -4.10  50.4
    #  2     1 -4.10  50.4
    #  3     1 -4.10  50.4
    #  4     1 -4.10  50.4
    #  5     1 -4.10  50.4
    #  6     1 -4.10  50.4
    #  7     2 -4.10  50.4
    #  8     2 -4.10  50.4
    #  9     2 -4.10  50.4
    # 10     2 -4.10  50.4
    # 11     2 -4.10  50.4
    # 12     2 -4.10  50.4
    # 13     3 -4.10  50.4
    # 14     3 -4.10  50.4
    # 15     3 -4.10  50.4
    # 16     3 -4.10  50.4
    # 17     3 -4.10  50.4
    # 18     3 -4.10  50.4
    
    Login or Signup to reply.
  2. As we are dealing with GeoJSON string, it can be parsed with sf::st_read() or perhaps with gejsonsf::geojson_sfc() for some performance boost (~ 2x).

    Rowwise grouping to access one row at a time; keeping only LINESTRING geometries (presumably one per FeatureCollection, as in provided sample).

    library(dplyr)
    library(sf)
    #> Linking to GEOS 3.9.3, GDAL 3.5.2, PROJ 8.2.1; sf_use_s2() is TRUE
    library(geojsonsf)
    
    json_str <- '{"type": "FeatureCollection", "features": [{"type": "Feature", "geometry": {"type": "Point", "coordinates": [-4.095772, 50.409393]}, "properties": {"name": "start"}}, {"type": "Feature", "geometry": null, "properties": {"name": "end"}}, {"type": "Feature", "geometry": {"type": "LineString", "coordinates": [[-4.095772, 50.409393], [-4.095781, 50.409397], [-4.095792, 50.409401], [-4.095965, 50.40971], [-4.096064, 50.410069], [-4.09597, 50.410397]]}, "properties": {"distance": 4027.4, "name": "Raw", "times": [1690900467000, 1690900520000, 1690900522000, 1690900539000, 1690900550000, 1690900569000], "duration": 4923.0}}]}'
    
    # 100-row test sample
    df_100 <- tibble(journey_id = 1:100, journeyroute = rep(json_str, 100))
    df_100
    #> # A tibble: 100 × 2
    #>    journey_id journeyroute                                                      
    #>         <int> <chr>                                                             
    #>  1          1 "{"type": "FeatureCollection", "features": [{"type": "Fe…
    #>  2          2 "{"type": "FeatureCollection", "features": [{"type": "Fe…
    #>  3          3 "{"type": "FeatureCollection", "features": [{"type": "Fe…
    #> ...
    
    microbenchmark::microbenchmark(
      sf = {
        # parse GeoJSON strings with sf / GEOS
        routes_sf <- df_100 %>% 
          rowwise() %>% 
          mutate(geometry = st_read(journeyroute, quiet = TRUE) %>% 
                            st_geometry() %>% 
                            `[`(st_geometry_type(.) == "LINESTRING"), .keep = "unused") %>% 
          ungroup() %>% 
          st_as_sf()
      },
      geojson_sf = {
        # parse GeoJSON strings with geojsonsf
        routes_gj <- df_100 %>% 
          rowwise() %>% 
          mutate(geometry = geojson_sfc(journeyroute) %>% 
                            `[`(st_geometry_type(.) == "LINESTRING"), .keep = "unused") %>% 
          ungroup() %>% 
          st_as_sf()
      }
    )
    

    Benchmark results and resulting sf object:

    #> Unit: milliseconds
    #>        expr      min       lq     mean   median       uq      max neval cld
    #>          sf 437.4351 453.1961 476.8028 464.1172 487.9901 628.0495   100  a 
    #>  geojson_sf 198.3025 207.9465 219.1129 212.6965 221.7101 309.2461   100   b
    
    routes_sf
    #> Simple feature collection with 100 features and 1 field
    #> Geometry type: LINESTRING
    #> Dimension:     XY
    #> Bounding box:  xmin: -4.096064 ymin: 50.40939 xmax: -4.095772 ymax: 50.4104
    #> Geodetic CRS:  WGS 84
    #> # A tibble: 100 × 2
    #>    journey_id                                                           geometry
    #>         <int>                                                   <LINESTRING [°]>
    #>  1          1 (-4.095772 50.40939, -4.095781 50.4094, -4.095792 50.4094, -4.095…
    #>  2          2 (-4.095772 50.40939, -4.095781 50.4094, -4.095792 50.4094, -4.095…
    #>  3          3 (-4.095772 50.40939, -4.095781 50.4094, -4.095792 50.4094, -4.095…
    #>  4          4 (-4.095772 50.40939, -4.095781 50.4094, -4.095792 50.4094, -4.095…
    #>  5          5 (-4.095772 50.40939, -4.095781 50.4094, -4.095792 50.4094, -4.095…
    #>  6          6 (-4.095772 50.40939, -4.095781 50.4094, -4.095792 50.4094, -4.095…
    #>  7          7 (-4.095772 50.40939, -4.095781 50.4094, -4.095792 50.4094, -4.095…
    #>  8          8 (-4.095772 50.40939, -4.095781 50.4094, -4.095792 50.4094, -4.095…
    #>  9          9 (-4.095772 50.40939, -4.095781 50.4094, -4.095792 50.4094, -4.095…
    #> 10         10 (-4.095772 50.40939, -4.095781 50.4094, -4.095792 50.4094, -4.095…
    #> # ℹ 90 more rows
    

    Created on 2023-08-04 with reprex v2.0.2

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search