skip to Main Content

As Azure Machine Learning has more maturity compared to Synapse Data Science from Fabric that I recently been using, I wanted to know:

  • If there is a way to access OneLake data (Files or Tables) from an Azure ML instance, within the same subscription ?
  • If so, how to do it ? (I did not find any documentation or tutorial for this case. The most similarish thing I found is an explanation on how to make model endpoint from Azure ML available in Fabric)
  • What are the possible bottlenecks (data transfer if I need to use an potential intermediate storage for example)

Thanks in advance

2

Answers


  1. Chosen as BEST ANSWER

    FYI: I've managed to do it this way, as the OneLake connector might be struggling to exactly match the expected variables of the api. For my situation, to reach for Tables in a OneLake on Fabric:

    datastore.yml

    $schema: http://azureml/sdk-2-0/OneLakeDatastore.json
    name: datastore_name
    type: one_lake
    description: Credential-less datastore pointing to a Microsoft Fabric 
    OneLake lakehouse
    one_lake_workspace_name: "workspace_name"
    endpoint: "onelake.dfs.fabric.microsoft.com"
    artifact:
      type: lake_house
      name: "OnelakeName.Lakehouse/Tables"
    

    In the cloud shell, after uploading the yml file:

    az ml datastore create --file datastore.yml --resource-group your_resource_group --workspace-name your_azureml_workspace
    

    Inside a notebook, using mltable:

    !pip install mltable
    
    from mltable import from_delta_lake
    url = "abfss://path_of_the_resource_in_onelake/Tables/table"
    df = from_delta_lake(url).to_pandas_dataframe()
    

    Works like a charm !


  2. To access OneLake data from an Azure ML instance within the same subscription, you can use Azure Data Lake Storage Gen2 (ADLS Gen2) as the intermediate storage.

    1. Set up ADLS Gen2 and ensure Azure ML has access.
    2. Use the Azure ML SDK for Python to access data from ADLS Gen2.
    3. Bottlenecks:Data transfer latency and costs.

    You can use the Datastore and Dataset classes in the Azure ML SDK to access data.

    I hope this helps 🙂

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search