skip to Main Content

I’m working on some interactive development in an Azure Machine Learning notebook and I’d like to save some data directly from a pandas DataFrame to a csv file in my default connected blob storage account. I’m currently loading some data the following way:

import pandas as pd

uri = f"azureml://subscriptions/<sub_id>/resourcegroups/<res_grp>/workspaces/<workspace>/datastores/<datastore_name>/paths/<path_on_datastore>"
df = pd.read_csv(uri)

I have no problem loading this data, but after some basic transformations I’d like to save this data to my storage account. Most, if not all solutions I have found suggest saving this file to a local directory and then uploading this saved file to my storage account. The best solution I have found on this is the following, which uses tmpfile so I don’t have to go and delete any ‘local’ files afterwards:

from azureml.core import Workspace
import tempfile

ws = Workspace.from_config()
datastore = ws.datastores.get("exampleblobstore")

with tempfile.TemporaryDirectory() as tmpdir:
    tmpath = f"{tmpdir}/example_file.csv"
    datastore.upload_files([tmpath], target_path="path/to/target.csv", overwrite=True)

This is a reasonable solution, but I’m wondering if there is any way I can directly write to my storage account without the need to save the file first. Ideally I’d like to do something as simple as:

target_uri = f"azureml://subscriptions/<sub_id>/resourcegroups/<res_grp>/workspaces/<workspace>/datastores/<datastore_name>/paths/<path_on_datastore>"

After some reading I thought the class AzureMachineLearningFileSystem may allow me to read and write data to my datastore, in a similar way to how I might when developing on a local machine, however, it appears this class will not let me write data, only inspect the ‘file system’ and read data from it.



  1. You can use fsspec and adlfs package to write data to storage account with proper authentication.

    Install packages
    pip install azureml-fsspec adlfs

    Refer this documentation

    st = {'account_key' : '<account_key>'}
    ## or st = {'sas_token' : 'sas_token_value'}
    ## or st = {'connection_string' : 'connection_string_value'}
    ## or st = {'tenant_id': 'tenant_id_value', 'client_id' : 'client_id_value',    'client_secret': 'client_secret_value'}
    df.to_csv('abfs://<container_name>@<storage_account_name>', storage_options  =  st)

    Here, you need to give storage option for authentication, I would recommend using sas_token or service principal.


    enter image description here

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top