skip to Main Content

I have got 2 components in Azure Machine Learning.
I have got 2 dataframes in the first component (called prep) which I want to pass into the next component (called middle) for further processing.

In the prep code, I have tried to save the dataframe into the component’s output section, into a datastore and into the args location passed in as input parameters.
As shown below:

print((Path(args.Y_df) / "Y_df.csv"))
df1.to_csv("./outputs/Y_df.csv")
 df1.to_csv(args.Y_df.path)
 df1.to_csv("azureml://subscriptions/subscription_id/resourcegroups/rg_group/workspaces/workspace_name/datastores/datastore_name/paths/azureml/forecast/testing/y_df.csv")

Out of these only the first method works.
Now I want to pass this into the next component. So in the pipeline definition code, I have mentioned this:

def data_pipeline(
    compute_train_node: str,
):

    prep_node = prep()
    transform_node = middle(Y_df=prep_node.outputs.Y_df,
                            S_df=prep_node.outputs.S_df)

I am trying to run a basic code in the middle component but it just does not get started. It fails with the following error:

enter image description here

Below are YAMLS for prep and middle:
middle:

name: middle4 display_name: middle4

inputs:   Y_df:
    type: uri_file   S_df:
    type: uri_file

code: ./middle

environment: azureml:environment_name:4

command: >-   python middle_script.py   --Y_df ${{inputs.Y_df}}   
--S_df ${{inputs.S_df}}

prep:

name: preprocessing24
display_name: preprocessing24

outputs:
  Y_df:
    type: uri_file

  S_df:
    type: uri_file

code: ./preprocessing

environment: azureml:environment_name:4

command: >-
  python preprocessing_script.py
  --Y_df ${{outputs.Y_df}} 
  --S_df ${{outputs.S_df}}

What am I doing wrong?
How do I pass file from one component to the other?

Edit after trying out the method in the answer:

As of now, args.Y_df points to some random (probably default) file path instead of the one I have given it as part of the Output() function as mentioned in the answer.
It then gives an error saying

OSError: Cannot save file into a non-existent directory:
‘/mnt/azureml/cr/j/32h438dshj537dj284ndhs630e1/cap/data-capability/wd/Y_df/testing’

Below is the code I have written for getting the path into the prep code. This path is used to save the dataframes as csv.

parser = argparse.ArgumentParser("prep")
parser.add_argument("--Y_df", type=str, help="Path of prepped data")
parser.add_argument("--S_df", type=str, help="Path of prepped data")
parser.add_argument("--clinical_actuals_path", type=str, help="Path of prepped data")
args = parser.parse_args()

2

Answers


  1. Chosen as BEST ANSWER

    Answering, based on all the information provided by JayashankarGS above. His method is what solved almost the entire issue and I just added one extra parameter to the code that he has provided.

    from  azure.ai.ml  import  MLClient, Input, Output
    
    def data_pipeline(
    compute_train_node: str,
    ):
    
    prep_node = prep()
    
    prep_node.outputs.Y_df= Output(type="uri_folder", mode = 'rw_mount', path="azureml://datastores/<datastore_name>/paths/csvs/Y_df/")
    prep_node.outputs.S_df= Output(type="uri_folder", mode = 'rw_mount', path="azureml://datastores/<datastore_name>/paths/csvs/S_df/")
    
    transform_node = middle(Y_df=prep_node.outputs.Y_df,
                            S_df=prep_node.outputs.S_df)
    

    This is the same code that JayashankarGS has posted, I just added another parameter in the Output() function

    mode = 'rw_mount'
    

    This solved all the issues.


  2. You have to give datastore path to the output of prep_node like below.

    from  azure.ai.ml  import  MLClient, Input, Output
    
    def data_pipeline(
        compute_train_node: str,
    ):
    
        prep_node = prep()
        
        prep_node.outputs.Y_df= Output(type="uri_folder", path="azureml://datastores/<datastore_name>/paths/csvs/Y_df/")
        prep_node.outputs.S_df= Output(type="uri_folder", path="azureml://datastores/<datastore_name>/paths/csvs/S_df/")
        
        transform_node = middle(Y_df=prep_node.outputs.Y_df,
                                S_df=prep_node.outputs.S_df)
    

    Here, i am giving Output object with datastore path to Y_df andS_df.

    Next, save csv files in prep component like below.

    df1.to_csv(Path(args.Y_df) / "Y_df.csv")
    
    df2.to_csv(Path(args.S_df) / "S_df.csv")
    

    If you want to save 2 files in single folder giving single output to prep component and access them with that folder in next component.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search