Ubuntu - Convert a list into a tibble with nested columns

mindlessgreen
July 13, 2024
166 views
5 votes
5 Answers

I would like to convert a list like this into a tibble.

lst <- list(
  "A"=list(
    "Category"="A",
    "Team"=c("x"),
    "City"="N"
  ),
  "C"=list(
    "Category"="C",
    "Team"=c(
      "x","v","z"
    ),
    "City"="V"
  ),
  "F"=list(
    "Category"="F",
    "Team"=c("o")
  ),
  "K"=list(
    "Category"="K",
    "Team"=c(
      "f","h","y"
    ),
    "City"="T"
  ),
  "B"=list(
    "Category"="B",
    "City"=c("Q")
  )
)

There can be multiple items in the list. I have only three in this example: Category, Team and City. Category will always have only one unique value without duplication while the other fields can be one, many or missing.

I need one unique Category per row. Any other column that has multiple values for a category must be come a nested list.

This is the output I am looking for:

  Category Team      City 
  <chr>    <list>    <chr>
1 A        <chr [1]> N    
2 C        <chr [3]> V    
3 F        <chr [1]> NA   
4 K        <chr [3]> T    
5 B        <chr [1]> Q

This was manually created by

library(purrr)
library(dplyr)

d <- map_dfr(lst,~as_tibble(.x))
reframe(d,across(Team,list),.by=Category) |>
  left_join(distinct(select(d,-Team),Category,.keep_all=TRUE), by="Category")

I am looking for a solution that would work for any number of fields/columns and any number of items in any column (except Category which will have only one item).

One of the many failed attempts:

map_dfr(lst,~as_tibble(.x)) |>
  group_by(Category) |> 
  summarise(
    across(everything(), ~ {
      if (n_distinct(.) == 1) {
        first(.)
      } else {
        list(.)
      }
    })
  )

Error in `summarise()`:
ℹ In argument: `across(...)`.
Caused by error:
! `Team` must return compatible vectors across groups.
ℹ Result of type <character> for group A: `Category = "A"`.
ℹ Result of type <list> for group C: `Category = "C"`.
Run `rlang::last_trace()` to see where the error occurred.

Session

R version 4.4.1 (2024-06-14)
Platform: x86_64-pc-linux-gnu
Ubuntu 22.04.4 LTS
dplyr_1.1.4
tidyr_1.3.1
purrr_1.0.2

Tags: purrr r tidyverse

Answers

Maybe this solve your problem

lst %>% 
  dplyr::bind_rows() %>% 
  tidyr::nest(Team = Team)

# A tibble: 5 × 3
  Category City  Team            
  <chr>    <chr> <list>          
1 A        N     <tibble [1 × 1]>
2 C        V     <tibble [3 × 1]>
3 F        NA    <tibble [1 × 1]>
4 K        T     <tibble [3 × 1]>
5 B        Q     <tibble [1 × 1]>

Perhaps something like:

list_transpose(lst, default = NA, template = unique(unlist(map(lst, names)))) |> 
  as_tibble() |> 
  mutate(across(where(is.list), (l) if (all(lengths(l) == 1)) unlist(l) else l))

# A tibble: 5 × 3
  Category Team         City 
  <chr>    <named list> <chr>
1 A        <chr [1]>    N    
2 C        <chr [3]>    V    
3 F        <chr [1]>    NA   
4 K        <chr [3]>    T    
5 B        <lgl [1]>    Q

I hope this might work out well. tested on a new element with a new column.

lst <- list(
  .... PREVIOUS lst LIST ....

  my_addition = list(
    Category = "Z",
    Team = state.name[1:2],
    City = letters[1:3],
    Zip = state.abb[1:4]
    
  )
)

lst_to_df <- function(lst_item) {
  tibble(
    Category = lst_item$Category,
    !!!map(lst_item[-1], list)
  )
}

map_dfr(lst, lst_to_df, .id = "List_Name") # remove `.id` to change from `my_addition` to `Z` as you probably really want

Category Team      City      Zip      
  <chr>    <list>    <list>    <list>   
1 A        <chr [1]> <chr [1]> <NULL>   
2 C        <chr [3]> <chr [1]> <NULL>   
3 F        <chr [1]> <NULL>    <NULL>   
4 K        <chr [3]> <chr [1]> <NULL>   
5 B        <NULL>    <chr [1]> <NULL>   
6 Z        <chr [2]> <chr [3]> <chr [4]>
>

Here is another solution that works for any number of fields/columns and any number of items in any column. Since you wish to mantain a single row by Category level, each column correspond to a list with the corresponding distinct values.

library(rlang)

# Bind lists as tibble
df <- lst |>
  bind_rows() 

# Get the column name used as key
key <- colnames(df)[1]

# Map over all columns and nest distinct values
map(colnames(df)[-1], function(x){
  df |>
    select(all_of(c(key, x))) |>
    group_by(!!sym(key)) |>
    distinct(!!sym(x)) |>
    nest(!!sym(x):= x)
}) |>
  # Reduce the lists using left_join
  reduce(left_join, by = key) 

# A tibble: 5 × 3
# Groups:   Category [5]
#  Category Team             City            
#  <chr>    <list>           <list>          
#1 A        <tibble [1 × 1]> <tibble [1 × 1]>
#2 C        <tibble [3 × 1]> <tibble [1 × 1]>
#3 F        <tibble [1 × 1]> <tibble [1 × 1]>
#4 K        <tibble [3 × 1]> <tibble [1 × 1]>
#5 B        <tibble [1 × 1]> <tibble [1 × 1]>

- AndreWildberg
- July 13, 2024 at 1:56 am
- 0 votes
0
An approach using unnest_wider
```
library(dplyr)
library(tidyr)

tibble(lst) %>% unnest_wider(lst)
# A tibble: 5 × 3
  Category Team      City 
  <chr>    <list>    <chr>
1 A        <chr [1]> N    
2 C        <chr [3]> V    
3 F        <chr [1]> NA   
4 K        <chr [3]> T    
5 B        <NULL>    Q
```
Note that the non-existing Team is kept as NULL.
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Ubuntu – Convert a list into a tibble with nested columns

Answers