Debian - purrr and map: how to save intermediate computations?

larry77
June 27, 2024
158 views
0 votes
2 Answers

Please consider the snippet at the end of the post.
I would like to be able to save (possibly as an RDS) the results of the computations while they progress (e.g. every time a new 10% of the list is processed). How can I do that?

library(tidyverse)
ll <- 1:1000
res <- map(ll, (x) cos(x))

sessionInfo()
#> R version 4.4.1 (2024-06-14)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Debian GNU/Linux 12 (bookworm)
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.11.0 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.11.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
#>  [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
#>  [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: Europe/Brussels
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#>  [1] lubridate_1.9.3 forcats_1.0.0   stringr_1.5.1   dplyr_1.1.4    
#>  [5] purrr_1.0.2     readr_2.1.5     tidyr_1.3.1     tibble_3.2.1   
#>  [9] ggplot2_3.5.1   tidyverse_2.0.0
#> 
#> loaded via a namespace (and not attached):
#>  [1] gtable_0.3.5      compiler_4.4.1    reprex_2.1.0      tidyselect_1.2.1 
#>  [5] scales_1.3.0      yaml_2.3.8        fastmap_1.1.1     R6_2.5.1         
#>  [9] generics_0.1.3    knitr_1.46        munsell_0.5.1     R.cache_0.16.0   
#> [13] tzdb_0.4.0        pillar_1.9.0      R.utils_2.12.3    rlang_1.1.3      
#> [17] utf8_1.2.4        stringi_1.8.4     xfun_0.43         fs_1.6.4         
#> [21] timechange_0.3.0  cli_3.6.2         withr_3.0.0       magrittr_2.0.3   
#> [25] digest_0.6.35     grid_4.4.1        hms_1.1.3         lifecycle_1.0.4  
#> [29] R.methodsS3_1.8.2 R.oo_1.26.0       vctrs_0.6.5       evaluate_0.23    
#> [33] glue_1.7.0        styler_1.10.3     fansi_1.0.6       colorspace_2.1-0 
#> [37] rmarkdown_2.26    tools_4.4.1       pkgconfig_2.0.3   htmltools_0.5.8.1

^{Created on 2024-06-27 with reprex v2.1.0}

Answers

- BenBolker
- June 27, 2024 at 10:34 pm
- 0 votes
0
Turns out there’s a package for that, currr ("checkpoint" + purrr). It doesn’t save precisely in the form you specified (but see below for how to access intermediate results), but these functions (cp_map() for example)

create a secret folder in your current working directory and save the results if they reach a given checkpoint. This way if you rerun the code, it reads the result from the cache folder and starts to evaluate where you finished. [slightly edited from original]

cp_map() has a cp_option= argument that allows you to specify how often to checkpoint (i.e., how many checkpoints per job) and where to store the results.
```
library(currr)
options(currr.n_checkpoint = 10, currr.folder = "checkpoints")
cc <- cp_map(1:1000, name = "cos_results", cos)
list.files("checkpoints/cos_results")
```
If you want to look at these intermediate outputs directly (rather than using them via the package as an automated checkpointing system) you’ll have to figure out what these files are: it looks like the out* files are storing chunks of output (e.g. out_301.rds has the results for cos(301:400)).
```
 [1] "et_1.rds"    "et_101.rds"  "et_201.rds"  "et_301.rds"  "et_401.rds" 
 [6] "et_501.rds"  "et_601.rds"  "et_701.rds"  "et_801.rds"  "et_901.rds" 
[11] "f.rds"       "id_1.rds"    "id_101.rds"  "id_201.rds"  "id_301.rds" 
[16] "id_401.rds"  "id_501.rds"  "id_601.rds"  "id_701.rds"  "id_801.rds" 
[21] "id_901.rds"  "meta.rds"    "out_1.rds"   "out_101.rds" "out_201.rds"
[26] "out_301.rds" "out_401.rds" "out_501.rds" "out_601.rds" "out_701.rds"
[31] "out_801.rds" "out_901.rds" "st_1.rds"    "st_101.rds"  "st_201.rds" 
[36] "st_301.rds"  "st_401.rds"  "st_501.rds"  "st_601.rds"  "st_701.rds" 
[41] "st_801.rds"  "st_901.rds"  "x.rds"      
```
Login or Signup to reply.

- LMC
- June 27, 2024 at 10:46 pm
- 0 votes
0
Another option is to create a functional operator and store the output in the function environment. Then you can retrieve this information after (or save it using readRDS()):
```
library(purrr)

funop <- function(f, niter) {
  force(f)
  force(niter)
  i <- 0
  output <- vector("list", length = niter)
  function(...) {
    i <<- i + 1
    val <- f(...)
    if (i %% 10 == 0) {
      output[[i]] <<- val # saveRDS() call here
    }
    return(val)
  }
}

mycos <- funop(cos, length(ll))
res <- map(ll, mycos)

compact(environment(mycos)$output) |> head(3)
# [[1]]
# [1] -0.8390715
# 
# [[2]]
# [1] 0.4080821
# 
# [[3]]
# [1] 0.1542514

## OR saveRDS() after
saveRDS("results", compact(environment(mycos)$output))
```
This just saves the output to a list object in the function environment, but you could easily replace that with a saveRDS() call. Then you wouldn’t need to create the output list object.
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Debian – purrr and map: how to save intermediate computations?

Answers