I have the following function to read a CSV file from Azure:
read_csv_from_azure <- function(file_path, container) {
# Try to download the file and handle potential errors
tryCatch({
# Download the file from the Azure container
downloaded_file <- storage_download(container, file_path, NULL)
# Convert the raw data to a character string
file_content <- rawToChar(downloaded_file)
# Read the CSV content using data.table's fread
data <- fread(text = file_content, sep = ",")
# Return the data
return(data)
}, error = function(e) {
# Print an error message if an exception occurs
message("An error occurred while downloading or reading the file: ", e)
return(NULL)
})
}
However, the performance of this function is not sufficient for my requirements; it takes too long to read a CSV file. The CSV files are around 30MB each.
How can I make it more efficient?
Thanks
2
Answers
As far as I know, the
{arrow}
package is significantly faster for reading csv files in R.Try saving the file into a temporary directory, then reading it with Arrow:
You can use the below code to read the large csv file using R language.
I agree with Bastián Olea Herrera’s answer, the
arrow
package is faster to read csv files.If you are thinking
storage_download
is causing problem, you can usehttr
package to download the file has temp and read witharrow
and delete the temp file using AzureSAS
token authentication.Code:
Output:
If you’re need to use the
data.table
package, you can usefread
to read the CSV file directly from the downloaded file. This avoids converting it to a character string first, which can save both time and memory.