The GUI in Azure Machine Learning for creating datasets is straight forward. But I have a hard time creating it through python code. I’m using the Python 3.8 Azure ML kernel. Here is the code I have but it’s running into a bug and I’m not able to debug it.
from azureml.core import Workspace, Dataset
from azureml.contrib.dataset import FileHandlingOption
from azureml.contrib.dataset.labeled_dataset import _LabeledDatasetFactory
# Authenticate and create a workspace object
ws = Workspace.from_config()
# Get a reference to the registered dataset
dataset = Dataset.get_by_name(ws, 'my-registered-dataset')
# Create a labeled dataset factory
labeled_dataset_factory = _LabeledDatasetFactory()
# Create the labeling project
project = labeled_dataset_factory.from_input_data_reference(
dataset.as_named_input('my_data').as_download(),
label_column_name='my-label-column',
file_handling_option=FileHandlingOption.SKIP_DOWNLOAD
)
# Register the labeling project
project.register(ws)
I’m receiving this error message:
AttributeError: '_LabeledDatasetFactory' object has no attribute 'from_input_data_reference'
What attribute should I use here to get this running?
2
Answers
Could you please share which document you are referring to for creating a data labeling? Are you using V2 or V1 SDK?
The quickest way from my personal experience is using the Azure Machine Learning Studio to create it as this document – https://learn.microsoft.com/en-us/azure/machine-learning/how-to-create-image-labeling-projects
You can try replacing some lines of your code with the following.
Import package:
Create a labeled dataset factory:
Create the labeling project using
.from_dataset()
attribute instead of.from_input_data_reference
: