I have a function that relies on cutting the datetime
using temporal bins. Unfortunately, it’s behaving differently in Ubuntu and Windows. Here’s a reproducible example of the error (Instead of getting 2023-01-03 00:00:00
we get 2023-01-03
).
datetimes <- seq(from = as.POSIXct("2023-01-02 23:00:00", tz = "UTC"),
to = as.POSIXct("2023-01-03 01:00:00", tz = "UTC"),
by = "30 min")
# Apply cut function
cut_times <- cut(datetimes, breaks = "1 hour")
# Convert to data frame for easier viewing
df <- data.frame(original_times = datetimes, cut_times = cut_times)
# Print dataframe
print(df)
original_times cut_times
1 2023-01-02 23:00:00 2023-01-02 23:00:00
2 2023-01-02 23:30:00 2023-01-02 23:00:00
3 2023-01-03 00:00:00 2023-01-03
4 2023-01-03 00:30:00 2023-01-03
5 2023-01-03 01:00:00 2023-01-03 01:00:00
I tried this in two different windows machines, here’s the session info for one of them
sessioninfo::session_info()
─ Session info ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
setting value
version R version 4.3.0 (2023-04-21 ucrt)
os Windows 11 x64 (build 22000)
system x86_64, mingw32
ui RTerm
language (EN)
collate English_United States.utf8
ctype English_United States.utf8
tz America/New_York
date 2023-07-19
pandoc NA
─ Packages ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
package * version date (UTC) lib source
cli 3.6.1 2023-03-23 [1] CRAN (R 4.3.0)
sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.1)
[1] C:/Users/choilab/R/R-4.3.0/library
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Here’s my expected behavior as I coded it in Ubuntu
> datetimes <- seq(from = as.POSIXct("2023-01-02 23:00:00", tz = "UTC"),
+ to = as.POSIXct("2023-01-03 01:00:00", tz = "UTC"),
+ by = "30 min")
> # Apply cut function
> cut_times <- cut(datetimes, breaks = "1 hour")
> # Convert to data frame for easier viewing
> df <- data.frame(original_times = datetimes, cut_times = cut_times)
> # Print dataframe
> print(df)
original_times cut_times
1 2023-01-02 23:00:00 2023-01-02 23:00:00
2 2023-01-02 23:30:00 2023-01-02 23:00:00
3 2023-01-03 00:00:00 2023-01-03 00:00:00
4 2023-01-03 00:30:00 2023-01-03 00:00:00
5 2023-01-03 01:00:00 2023-01-03 01:00:00
Here’s the Ubuntu machine sessioninfo
sessioninfo::session_info()
─ Session info ───────────────────────────────────────────────────────────────
setting value
version R version 4.1.2 (2021-11-01)
os Ubuntu 22.04.2 LTS
system x86_64, linux-gnu
ui X11
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz America/New_York
date 2023-07-19
pandoc 2.9.2.1 @ /usr/bin/pandoc
─ Packages ───────────────────────────────────────────────────────────────────
package * version date (UTC) lib source
cli 3.6.1 2023-03-23 [1] CRAN (R 4.1.2)
sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.1.2)
[1] /home/matias/R/x86_64-pc-linux-gnu-library/4.1
[2] /usr/local/lib/R/site-library
[3] /usr/lib/R/site-library
[4] /usr/lib/R/library
──────────────────────────────────────────────────────────────────────────────
Because cut
converts to factor, I have to later do as.POSIXct(as.character())
to bring back my data to datetime, which obviously creates problems given the time was dropped in one of them
This is what I get in Windows. It completely drops the timezone
as.POSIXct(as.character(df$cut_times), tz="UTC")
[1] "2023-01-02 UTC" "2023-01-02 UTC" "2023-01-03 UTC" "2023-01-03 UTC"
[5] "2023-01-03 UTC"
2
Answers
Somebody recommended me the
clock
package (docs here). In particular,clock::date_group(..., n, precision)
which can be used to replace thecut
call and keep the units datetime friendly. I haven't exhaustively tested this, but it seems to be producing the same result in different machines.Paste 00:00:00 onto the ends. It will be ignored on those components for which there is already a time.