skip to Main Content

I am using the mltable library on an AzureML notebook.

I can successufully load a local csv file as an mltable:

from mltable import from_delimited_files
paths = [{'file': "dati_estra_test.csv"}]
dati = from_delimited_files(paths)

And I can view it as a pandas dataframe:
enter image description here

Is there a way to write this artifact as an MLTable artifact?
Or to register it as an mltable AzureML dataset?

2

Answers


  1. Use the below code block to get the file downloaded.

    from azureml.core import Workspace, Dataset
    
    subscription_id = ‘subscription'
    resource_group = ‘your RG’
    workspace_name = 'nov21'
    
    workspace = Workspace(subscription_id, resource_group, workspace_name)
    
    dataset = Dataset.get_by_name(workspace, name='churn')
    dataset.to_pandas_dataframe()
    
    dataset.to_pandas_dataframe(on_error='null', out_of_range_datetime='null')
    
    dataset.download('Churn', target_path='df.csv', overwrite=False, ignore_not_found=True)
    

    This will download the file to the specific folder.

    Login or Signup to reply.
  2. In mltable version 1.0.0, a save method was introduced that will write out the MLTable file:

    https://learn.microsoft.com/python/api/mltable/mltable.mltable.mltable?view=azure-ml-py#mltable-mltable-mltable-save

    Artifacts should be stored in a folder. Therefore, you need to create a folder that stores the dati_estra_test.csv, so

    # create directory
    mkdir dati_estra_test
    
    # move csv to directory
    mv dati_estra_test.csv dati_estra_test
    

    Next, create/save the MLTable file using the SDK:

    import mltable
    import os
    
    # change the working directory to the data directory
    os.chdir("./dati_estra_test")
    
    # define the path to relative to the MLTable
    path = {
        'file': './dati_estra_test.csv'
    }
    
    # load from parquet files
    tbl = mltable.from_delimited_files(paths=[path])
    
    # show the first few records
    new_tbl.show()
    
    # save MLTable file in the data directory
    new_tbl.save(".")
    

    You can create a data asset using either the CLI (note the path should be pointing to the artifact folder):

    az ml data create --name dati_estra_test --version 1 --type mltable --path ./dati_estra_test
    

    Or the Python SDK:

    from azure.ai.ml.entities import Data
    from azure.ai.ml.constants import AssetTypes
    
    my_path = './dati_estra_test'
    
    my_data = Data(
        path=my_path,
        type=AssetTypes.MLTABLE,
        name="dati_estra_test",
        version='1'
    )
    
    ml_client.data.create_or_update(my_data)
    

    When the asset is created your artifact will automatically be uploaded to cloud storage (the default Azure ML Datastore).

    It should be noted that it isn’t a requirement to use Azure ML Tables (mltable) when your data is tabular in nature. You can use Azure ML File (uri_file) and Folder (uri_folder) types, and provide your own parsing logic to materialize the data into a Pandas or Spark data frame. In cases where you have a simple CSV file or Parquet folder, you’ll probably find it easier to use Azure ML Files/Folders rather than Tables.

    You’ll find Azure ML Tables (mltable) to be much more useful when you’re faced with the following scenarios:

    • The schema of your data is complex and/or changes frequently.
    • You only need a subset of data (for example: a sample of rows or files, specific columns, etc.).
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search