skip to Main Content

I have a function that relies on cutting the datetime using temporal bins. Unfortunately, it’s behaving differently in Ubuntu and Windows. Here’s a reproducible example of the error (Instead of getting 2023-01-03 00:00:00 we get 2023-01-03).

datetimes <- seq(from = as.POSIXct("2023-01-02 23:00:00", tz = "UTC"),
                  to = as.POSIXct("2023-01-03 01:00:00", tz = "UTC"),
                  by = "30 min")
# Apply cut function
cut_times <- cut(datetimes, breaks = "1 hour")
# Convert to data frame for easier viewing
df <- data.frame(original_times = datetimes, cut_times = cut_times)
# Print dataframe
print(df)
       original_times           cut_times
1 2023-01-02 23:00:00 2023-01-02 23:00:00
2 2023-01-02 23:30:00 2023-01-02 23:00:00
3 2023-01-03 00:00:00          2023-01-03
4 2023-01-03 00:30:00          2023-01-03
5 2023-01-03 01:00:00 2023-01-03 01:00:00

I tried this in two different windows machines, here’s the session info for one of them

sessioninfo::session_info()
─ Session info ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.3.0 (2023-04-21 ucrt)
 os       Windows 11 x64 (build 22000)
 system   x86_64, mingw32
 ui       RTerm
 language (EN)
 collate  English_United States.utf8
 ctype    English_United States.utf8
 tz       America/New_York
 date     2023-07-19
 pandoc   NA

─ Packages ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 package     * version date (UTC) lib source
 cli           3.6.1   2023-03-23 [1] CRAN (R 4.3.0)
 sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.3.1)

 [1] C:/Users/choilab/R/R-4.3.0/library

───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

Here’s my expected behavior as I coded it in Ubuntu

> datetimes <- seq(from = as.POSIXct("2023-01-02 23:00:00", tz = "UTC"), 
+                  to = as.POSIXct("2023-01-03 01:00:00", tz = "UTC"), 
+                  by = "30 min")
> # Apply cut function
> cut_times <- cut(datetimes, breaks = "1 hour")
> # Convert to data frame for easier viewing
> df <- data.frame(original_times = datetimes, cut_times = cut_times)
> # Print dataframe
> print(df)
       original_times           cut_times
1 2023-01-02 23:00:00 2023-01-02 23:00:00
2 2023-01-02 23:30:00 2023-01-02 23:00:00
3 2023-01-03 00:00:00 2023-01-03 00:00:00
4 2023-01-03 00:30:00 2023-01-03 00:00:00
5 2023-01-03 01:00:00 2023-01-03 01:00:00

Here’s the Ubuntu machine sessioninfo

sessioninfo::session_info()
─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.1.2 (2021-11-01)
 os       Ubuntu 22.04.2 LTS
 system   x86_64, linux-gnu
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       America/New_York
 date     2023-07-19
 pandoc   2.9.2.1 @ /usr/bin/pandoc

─ Packages ───────────────────────────────────────────────────────────────────
 package     * version date (UTC) lib source
 cli           3.6.1   2023-03-23 [1] CRAN (R 4.1.2)
 sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.1.2)

 [1] /home/matias/R/x86_64-pc-linux-gnu-library/4.1
 [2] /usr/local/lib/R/site-library
 [3] /usr/lib/R/site-library
 [4] /usr/lib/R/library

──────────────────────────────────────────────────────────────────────────────

Because cut converts to factor, I have to later do as.POSIXct(as.character()) to bring back my data to datetime, which obviously creates problems given the time was dropped in one of them

This is what I get in Windows. It completely drops the timezone

as.POSIXct(as.character(df$cut_times), tz="UTC")
[1] "2023-01-02 UTC" "2023-01-02 UTC" "2023-01-03 UTC" "2023-01-03 UTC"
[5] "2023-01-03 UTC"

2

Answers


  1. Chosen as BEST ANSWER

    Somebody recommended me the clock package (docs here). In particular, clock::date_group(..., n, precision) which can be used to replace the cut call and keep the units datetime friendly. I haven't exhaustively tested this, but it seems to be producing the same result in different machines.

    data.frame(original_times = datetimes, 
               cut_times = clock::date_group(datetimes, n = 15, precision = "minute"))
           original_times           cut_times
    1 2023-01-02 23:00:00 2023-01-02 23:00:00
    2 2023-01-02 23:30:00 2023-01-02 23:30:00
    3 2023-01-03 00:00:00 2023-01-03 00:00:00
    4 2023-01-03 00:30:00 2023-01-03 00:30:00
    5 2023-01-03 01:00:00 2023-01-03 01:00:00
    # change to 60, or whatever
    data.frame(original_times = datetimes, 
              cut_times = clock::date_group(datetimes, n = 60, precision = "minute"))
           original_times           cut_times
    1 2023-01-02 23:00:00 2023-01-02 23:00:00
    2 2023-01-02 23:30:00 2023-01-02 23:00:00
    3 2023-01-03 00:00:00 2023-01-03 00:00:00
    4 2023-01-03 00:30:00 2023-01-03 00:00:00
    5 2023-01-03 01:00:00 2023-01-03 01:00:00
    

  2. Paste 00:00:00 onto the ends. It will be ignored on those components for which there is already a time.

    as.POSIXct(paste(cut_times, "00:00:00"))
    ## [1] "2023-01-02 23:00:00 EST" "2023-01-02 23:00:00 EST"
    ## [3] "2023-01-03 00:00:00 EST" "2023-01-03 00:00:00 EST"
    ## [5] "2023-01-03 01:00:00 EST"
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search