skip to Main Content

I’m working on some interactive development in an Azure Machine Learning notebook and I’d like to save some data directly from a pandas DataFrame to a csv file in my default connected blob storage account. I’m currently loading some data the following way:

import pandas as pd

uri = f"azureml://subscriptions/<sub_id>/resourcegroups/<res_grp>/workspaces/<workspace>/datastores/<datastore_name>/paths/<path_on_datastore>"
df = pd.read_csv(uri)

I have no problem loading this data, but after some basic transformations I’d like to save this data to my storage account. Most, if not all solutions I have found suggest saving this file to a local directory and then uploading this saved file to my storage account. The best solution I have found on this is the following, which uses tmpfile so I don’t have to go and delete any ‘local’ files afterwards:

from azureml.core import Workspace
import tempfile

ws = Workspace.from_config()
datastore = ws.datastores.get("exampleblobstore")

with tempfile.TemporaryDirectory() as tmpdir:
    tmpath = f"{tmpdir}/example_file.csv"
    df.to_csv(tmpath)
    datastore.upload_files([tmpath], target_path="path/to/target.csv", overwrite=True)

This is a reasonable solution, but I’m wondering if there is any way I can directly write to my storage account without the need to save the file first. Ideally I’d like to do something as simple as:

target_uri = f"azureml://subscriptions/<sub_id>/resourcegroups/<res_grp>/workspaces/<workspace>/datastores/<datastore_name>/paths/<path_on_datastore>"
df.to_csv(target_uri)

After some reading I thought the class AzureMachineLearningFileSystem may allow me to read and write data to my datastore, in a similar way to how I might when developing on a local machine, however, it appears this class will not let me write data, only inspect the ‘file system’ and read data from it.

2

Answers


  1. You can use fsspec and adlfs package to write data to storage account with proper authentication.

    Install packages
    pip install azureml-fsspec adlfs

    Refer this documentation

    st = {'account_key' : '<account_key>'}
    ## or st = {'sas_token' : 'sas_token_value'}
    ## or st = {'connection_string' : 'connection_string_value'}
    ## or st = {'tenant_id': 'tenant_id_value', 'client_id' : 'client_id_value',    'client_secret': 'client_secret_value'}
    
    df.to_csv('abfs://<container_name>@<storage_account_name>.dfs.core.windows.net/local/tmp.csv', storage_options  =  st)
    

    Here, you need to give storage option for authentication, I would recommend using sas_token or service principal.

    output:

    enter image description here

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search