I’m working on some interactive development in an Azure Machine Learning notebook and I’d like to save some data directly from a pandas DataFrame
to a csv
file in my default connected blob storage account. I’m currently loading some data the following way:
import pandas as pd
uri = f"azureml://subscriptions/<sub_id>/resourcegroups/<res_grp>/workspaces/<workspace>/datastores/<datastore_name>/paths/<path_on_datastore>"
df = pd.read_csv(uri)
I have no problem loading this data, but after some basic transformations I’d like to save this data to my storage account. Most, if not all solutions I have found suggest saving this file to a local directory and then uploading this saved file to my storage account. The best solution I have found on this is the following, which uses tmpfile
so I don’t have to go and delete any ‘local’ files afterwards:
from azureml.core import Workspace
import tempfile
ws = Workspace.from_config()
datastore = ws.datastores.get("exampleblobstore")
with tempfile.TemporaryDirectory() as tmpdir:
tmpath = f"{tmpdir}/example_file.csv"
df.to_csv(tmpath)
datastore.upload_files([tmpath], target_path="path/to/target.csv", overwrite=True)
This is a reasonable solution, but I’m wondering if there is any way I can directly write to my storage account without the need to save the file first. Ideally I’d like to do something as simple as:
target_uri = f"azureml://subscriptions/<sub_id>/resourcegroups/<res_grp>/workspaces/<workspace>/datastores/<datastore_name>/paths/<path_on_datastore>"
df.to_csv(target_uri)
After some reading I thought the class AzureMachineLearningFileSystem
may allow me to read and write data to my datastore, in a similar way to how I might when developing on a local machine, however, it appears this class will not let me write data, only inspect the ‘file system’ and read data from it.
2
Answers
You can do it using a Blob Client.
See: https://learn.microsoft.com/en-us/azure/storage/blobs/storage-blob-upload-python
You can use
fsspec
andadlfs
package to write data to storage account with proper authentication.Install packages
pip install azureml-fsspec adlfs
Refer this documentation
Here, you need to give storage option for authentication, I would recommend using
sas_token
or service principal.output: