Please consider the snippet at the end of the post.
I would like to be able to save (possibly as an RDS) the results of the computations while they progress (e.g. every time a new 10% of the list is processed). How can I do that?
library(tidyverse)
ll <- 1:1000
res <- map(ll, (x) cos(x))
sessionInfo()
#> R version 4.4.1 (2024-06-14)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Debian GNU/Linux 12 (bookworm)
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.11.0
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.11.0
#>
#> locale:
#> [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
#> [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
#> [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Europe/Brussels
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] lubridate_1.9.3 forcats_1.0.0 stringr_1.5.1 dplyr_1.1.4
#> [5] purrr_1.0.2 readr_2.1.5 tidyr_1.3.1 tibble_3.2.1
#> [9] ggplot2_3.5.1 tidyverse_2.0.0
#>
#> loaded via a namespace (and not attached):
#> [1] gtable_0.3.5 compiler_4.4.1 reprex_2.1.0 tidyselect_1.2.1
#> [5] scales_1.3.0 yaml_2.3.8 fastmap_1.1.1 R6_2.5.1
#> [9] generics_0.1.3 knitr_1.46 munsell_0.5.1 R.cache_0.16.0
#> [13] tzdb_0.4.0 pillar_1.9.0 R.utils_2.12.3 rlang_1.1.3
#> [17] utf8_1.2.4 stringi_1.8.4 xfun_0.43 fs_1.6.4
#> [21] timechange_0.3.0 cli_3.6.2 withr_3.0.0 magrittr_2.0.3
#> [25] digest_0.6.35 grid_4.4.1 hms_1.1.3 lifecycle_1.0.4
#> [29] R.methodsS3_1.8.2 R.oo_1.26.0 vctrs_0.6.5 evaluate_0.23
#> [33] glue_1.7.0 styler_1.10.3 fansi_1.0.6 colorspace_2.1-0
#> [37] rmarkdown_2.26 tools_4.4.1 pkgconfig_2.0.3 htmltools_0.5.8.1
Created on 2024-06-27 with reprex v2.1.0
2
Answers
Turns out there’s a package for that,
currr
("checkpoint" +purrr
). It doesn’t save precisely in the form you specified (but see below for how to access intermediate results), but these functions (cp_map()
for example)cp_map()
has acp_option=
argument that allows you to specify how often to checkpoint (i.e., how many checkpoints per job) and where to store the results.If you want to look at these intermediate outputs directly (rather than using them via the package as an automated checkpointing system) you’ll have to figure out what these files are: it looks like the
out*
files are storing chunks of output (e.g.out_301.rds
has the results forcos(301:400)
).Another option is to create a functional operator and store the output in the function environment. Then you can retrieve this information after (or save it using
readRDS()
):This just saves the output to a list object in the function environment, but you could easily replace that with a
saveRDS()
call. Then you wouldn’t need to create theoutput
list object.