Question posted in Json
Our archive of expertly curated questions and answers provides insights and solutions to common problems related to this popular data interchange format. From parsing and manipulating JSON data to integrating it with various programming languages and web services, our archive has got you covered. Start exploring today and take your JSON skills to the next level

Json – Extracting coordinate data from a FeatureCollection in csv within R

TimBinding
August 4, 2023
135 views
0 votes
2 Answers

I’ve got data currently within csv, with a column called "journeyroute." This column has the following data [truncated due to size]:

{"type": "FeatureCollection", "features": [{"type": "Feature", "geometry": {"type": "Point", "coordinates": [-4.095772, 50.409393]}, "properties": {"name": "start"}}, {"type": "Feature", "geometry": null, "properties": {"name": "end"}}, {"type": "Feature", "geometry": {"type": "LineString", "coordinates": [[-4.095772, 50.409393], [-4.095781, 50.409397], [-4.095792, 50.409401], [-4.095965, 50.40971], [-4.096064, 50.410069], [-4.09597, 50.410397]]}, "properties": {"distance": 4027.4, "name": "Raw", "times": [1690900467000, 1690900520000, 1690900522000, 1690900539000, 1690900550000, 1690900569000], "duration": 4923.0}}]}

There are 5,000 rows of data. What I’m trying to do is extract out the LineString data to use within R, but I’m getting stuck. Can anyone help please?

I’ve tried converting to JSON and then unnesting, but comes up with an error (code adapted from other answers using Google Earth Engine):

new_df <- df %>%
    mutate(geo = map(Journey.Route, ~ jsonlite::fromJSON(.))) %>%
    as.data.frame() %>%
    unnest(geo) %>%
    filter(geo != "FeatureCollection") %>%
    mutate(coord = rep(c("x", "y"))) %>%
    pivot_wider(names_from = coord, values_from = coordinates)

Error in `mutate()`:
ℹ In argument: `coord = rep(c("x", "y"))`.
Caused by error:
! `coord` must be size 5000 or 1, not 2.
Run `rlang::last_trace()` to see where the error occurred.

Expecting a sf geometry column of LineString coordinates.

Tags: json r sf

Answers

I suggest you don’t convert the geo to a frame, since it’s a rather nested list. A quick function that extracts the "LineString" component:

getpart <- function(x, type = "LineString", what = "coordinates") {
  out <- 
    if ("geometry" %in% names(x) && "type" %in% names(x[["geometry"]]) &&
          identical(x[["geometry"]][["type"]], type)) {
      x[["geometry"]][["coordinates"]]
    } else if ("features" %in% names(x)) {
      lapply(x[["features"]], getpart)
    }
  if (is.matrix(out)) {
    out <- as.data.frame(out)
  } else if (is.list(out)) {
    out <- Filter(length, out)
    if (all(sapply(out, inherits, "data.frame")) &&
          length(unique(sapply(out, ncol))) == 1L) {
      out <- do.call(rbind, out)
    }
  }
  out
}

With this, testing on a singleton, we see

json <- '{"type": "FeatureCollection", "features": [{"type": "Feature", "geometry": {"type": "Point", "coordinates": [-4.095772, 50.409393]}, "properties": {"name": "start"}}, {"type": "Feature", "geometry": null, "properties": {"name": "end"}}, {"type": "Feature", "geometry": {"type": "LineString", "coordinates": [[-4.095772, 50.409393], [-4.095781, 50.409397], [-4.095792, 50.409401], [-4.095965, 50.40971], [-4.096064, 50.410069], [-4.09597, 50.410397]]}, "properties": {"distance": 4027.4, "name": "Raw", "times": [1690900467000, 1690900520000, 1690900522000, 1690900539000, 1690900550000, 1690900569000], "duration": 4923.0}}]}'
getpart(jsonlite::fromJSON(json, simplifyDataFrame = FALSE))
#          V1       V2
# 1 -4.095772 50.40939
# 2 -4.095781 50.40940
# 3 -4.095792 50.40940
# 4 -4.095965 50.40971
# 5 -4.096064 50.41007
# 6 -4.095970 50.41040

If your frame is well structured, then we can do something like:

df <- tibble(json = rep(json, 3)) %>%
  mutate(row = row_number())
df
# # A tibble: 3 × 2
#   json                                                                                                           row
#   <chr>                                                                                                        <int>
# 1 "{"type": "FeatureCollection", "features": [{"type": "Feature", "geometry": {"type": "Point…     1
# 2 "{"type": "FeatureCollection", "features": [{"type": "Feature", "geometry": {"type": "Point…     2
# 3 "{"type": "FeatureCollection", "features": [{"type": "Feature", "geometry": {"type": "Point…     3
df %>%
  mutate(geo = lapply(json, function(js) getpart(jsonlite::fromJSON(js, simplifyDataFrame = FALSE)))) %>%
  select(-json) %>%
  unnest(geo)
# # A tibble: 18 × 3
#      row    V1    V2
#    <int> <dbl> <dbl>
#  1     1 -4.10  50.4
#  2     1 -4.10  50.4
#  3     1 -4.10  50.4
#  4     1 -4.10  50.4
#  5     1 -4.10  50.4
#  6     1 -4.10  50.4
#  7     2 -4.10  50.4
#  8     2 -4.10  50.4
#  9     2 -4.10  50.4
# 10     2 -4.10  50.4
# 11     2 -4.10  50.4
# 12     2 -4.10  50.4
# 13     3 -4.10  50.4
# 14     3 -4.10  50.4
# 15     3 -4.10  50.4
# 16     3 -4.10  50.4
# 17     3 -4.10  50.4
# 18     3 -4.10  50.4

As we are dealing with GeoJSON string, it can be parsed with sf::st_read() or perhaps with gejsonsf::geojson_sfc() for some performance boost (~ 2x).

Rowwise grouping to access one row at a time; keeping only LINESTRING geometries (presumably one per FeatureCollection, as in provided sample).

library(dplyr)
library(sf)
#> Linking to GEOS 3.9.3, GDAL 3.5.2, PROJ 8.2.1; sf_use_s2() is TRUE
library(geojsonsf)

json_str <- '{"type": "FeatureCollection", "features": [{"type": "Feature", "geometry": {"type": "Point", "coordinates": [-4.095772, 50.409393]}, "properties": {"name": "start"}}, {"type": "Feature", "geometry": null, "properties": {"name": "end"}}, {"type": "Feature", "geometry": {"type": "LineString", "coordinates": [[-4.095772, 50.409393], [-4.095781, 50.409397], [-4.095792, 50.409401], [-4.095965, 50.40971], [-4.096064, 50.410069], [-4.09597, 50.410397]]}, "properties": {"distance": 4027.4, "name": "Raw", "times": [1690900467000, 1690900520000, 1690900522000, 1690900539000, 1690900550000, 1690900569000], "duration": 4923.0}}]}'

# 100-row test sample
df_100 <- tibble(journey_id = 1:100, journeyroute = rep(json_str, 100))
df_100
#> # A tibble: 100 × 2
#>    journey_id journeyroute                                                      
#>         <int> <chr>                                                             
#>  1          1 "{"type": "FeatureCollection", "features": [{"type": "Fe…
#>  2          2 "{"type": "FeatureCollection", "features": [{"type": "Fe…
#>  3          3 "{"type": "FeatureCollection", "features": [{"type": "Fe…
#> ...

microbenchmark::microbenchmark(
  sf = {
    # parse GeoJSON strings with sf / GEOS
    routes_sf <- df_100 %>% 
      rowwise() %>% 
      mutate(geometry = st_read(journeyroute, quiet = TRUE) %>% 
                        st_geometry() %>% 
                        `[`(st_geometry_type(.) == "LINESTRING"), .keep = "unused") %>% 
      ungroup() %>% 
      st_as_sf()
  },
  geojson_sf = {
    # parse GeoJSON strings with geojsonsf
    routes_gj <- df_100 %>% 
      rowwise() %>% 
      mutate(geometry = geojson_sfc(journeyroute) %>% 
                        `[`(st_geometry_type(.) == "LINESTRING"), .keep = "unused") %>% 
      ungroup() %>% 
      st_as_sf()
  }
)

Benchmark results and resulting sf object:

#> Unit: milliseconds
#>        expr      min       lq     mean   median       uq      max neval cld
#>          sf 437.4351 453.1961 476.8028 464.1172 487.9901 628.0495   100  a 
#>  geojson_sf 198.3025 207.9465 219.1129 212.6965 221.7101 309.2461   100   b

routes_sf
#> Simple feature collection with 100 features and 1 field
#> Geometry type: LINESTRING
#> Dimension:     XY
#> Bounding box:  xmin: -4.096064 ymin: 50.40939 xmax: -4.095772 ymax: 50.4104
#> Geodetic CRS:  WGS 84
#> # A tibble: 100 × 2
#>    journey_id                                                           geometry
#>         <int>                                                   <LINESTRING [°]>
#>  1          1 (-4.095772 50.40939, -4.095781 50.4094, -4.095792 50.4094, -4.095…
#>  2          2 (-4.095772 50.40939, -4.095781 50.4094, -4.095792 50.4094, -4.095…
#>  3          3 (-4.095772 50.40939, -4.095781 50.4094, -4.095792 50.4094, -4.095…
#>  4          4 (-4.095772 50.40939, -4.095781 50.4094, -4.095792 50.4094, -4.095…
#>  5          5 (-4.095772 50.40939, -4.095781 50.4094, -4.095792 50.4094, -4.095…
#>  6          6 (-4.095772 50.40939, -4.095781 50.4094, -4.095792 50.4094, -4.095…
#>  7          7 (-4.095772 50.40939, -4.095781 50.4094, -4.095792 50.4094, -4.095…
#>  8          8 (-4.095772 50.40939, -4.095781 50.4094, -4.095792 50.4094, -4.095…
#>  9          9 (-4.095772 50.40939, -4.095781 50.4094, -4.095792 50.4094, -4.095…
#> 10         10 (-4.095772 50.40939, -4.095781 50.4094, -4.095792 50.4094, -4.095…
#> # ℹ 90 more rows

^{Created on 2023-08-04 with reprex v2.0.2}

Please signup or login to give your own answer.

Click here to cancel reply.