How to pass custom docker environment to RunConfiguration when creating a PythonScriptStep in Azure ML?

MajorMajorMajorMajor
April 26, 2023
321 views
0 votes
2 Answers

Using RunConfiguration() class, I used the following way to pass my custom Dockerfile for setting up the environment for the python script

rc = RunConfiguration()
#rc.environment.use_docker = True
rc.docker = DockerConfiguration(use_docker=True)
rc.environment.from_dockerfile("webscraping_env", "./Dockerfile")

I can see in the config file of my rc that docker section is:

"docker": {
        "arguments": [],
        "baseDockerfile": "FROM python:3.8nnRUN apt-get update nRUN apt-get install -y gconf-service libasound2 libatk1.0-0 libcairo2 libcups2 libfontconfig1 libgdk-pixbuf2.0-0 libgtk-3-0 libnspr4 libpango-1.0-0 libxss1 fonts-liberation libappindicator1 libnss3 lsb-release xdg-utilsnn#download and install chromenRUN wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.debnRUN dpkg -i google-chrome-stable_current_amd64.deb; apt-get -fy installnn# install chromedrivernRUN apt-get install -yqq unzipnRUN wget -O /tmp/chromedriver.zip http://chromedriver.storage.googleapis.com/`curl -sS chromedriver.storage.googleapis.com/LATEST_RELEASE`/chromedriver_linux64.zipnRUN unzip /tmp/chromedriver.zip chromedriver -d /usr/local/bin/nnENV DISPLAY=:99nnRUN pip install selenium pandas bs4 lxml",
        "baseImage": null,
        "baseImageRegistry": {
            "address": null,
            "password": null,
            "registryIdentity": null,
            "username": null
        },
        "buildContext": null,
        "enabled": false,
        "platform": {
            "architecture": "amd64",
            "os": "Linux"
        },
        "sharedVolumes": true,
        "shmSize": "2g"
    }

and python section looks like this:

"python": {
        "baseCondaEnvironment": null,
        "condaDependencies": {
            "channels": [
                "anaconda",
                "conda-forge"
            ],
            "dependencies": [
                "python=3.8.13",
                {
                    "pip": [
                        "azureml-defaults"
                    ]
                }
            ],
            "name": "project_environment"
        },
        "condaDependenciesFile": null,
        "interpreterPath": "python",
        "userManagedDependencies": true
    }

And when I submit my pipeline consisting of single step in order to perform webscraping using selenium and bs4:

step = PythonScriptStep(
    script_name="./webscraping-script.py",  
    source_directory=".",
    arguments=["--output_path", webscrape_ouput],
    outputs=[webscrape_ouput],
    compute_target=AmlCompute(ws, "webscrape-nb"),
    runconfig=rc,
    allow_reuse=False)

I get an import error informing that selenium cannot be found inside the webscraping-script.py. And the pipeline run stops.

I suspect that my dockerfile is not being used as an environment for running the script.

My question:
How do I achieve this? I cannot find any arguments for PythonScriptStep to accept an environment directly like when you pass an environment argument to ParallelRunConfig when setting up a ParallelRunStep.

I was expecting that my Dockerfile would be used as an environment for the python script

Tags: azure azure-machine-learning-service

Answers

- Ramprasad
- April 27, 2023 at 9:00 am
- 0 votes
0
You can try adding the selenium package to the conda_dependencies of your RunConfiguration object before submitting the pipeline. Here’s an example:
```
from azureml.core.runconfig import CondaDependencies

cd = CondaDependencies.create(pip_packages=['selenium', 'beautifulsoup4'])
rc = RunConfiguration()
rc.environment.python.conda_dependencies = cd
```
Login or Signup to reply.

You can register your custom environment from a Dockerfile and then configure RunConfiguration to use this custom environment. What is nice about registering your environment is that you can easily reuse it.

Here is the python script to create an environment:

from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential
from azure.ai.ml.entities import Environment, BuildContext

credential = DefaultAzureCredential()

ml_client = MLClient(
    credential=credential,
    subscription_id="<your-subscription-od>",
    resource_group_name="<resource-group-name>",
    workspace_name="<your-workspace-name>",
)

env_docker_context = Environment(
    build=BuildContext(path=<path-to-your-docker-build-context>),
    name="my-custom-environment",
    description="Environment created from a Docker context.",
)
ml_client.environments.create_or_update(env_docker_context)

path=<path-to-your-docker-build-context> should be the path of your folder containing the required files to build your image + your Dockerfile. After running this script, you should be able to see your environment in the Environments tab in AzureML studio.

Then you can retrieve this environment and set up the rc.environment value:

# Make sure you import the right modules
from azureml.core.runconfig import RunConfiguration
from azureml.core import Environment
from azureml.pipeline.steps import PythonScriptStep

rc = RunConfiguration()
env = Environment.get(workspace=ws, name='my-custom-environment', version='1')
rc.environment = env

step = PythonScriptStep(
    script_name="./webscraping-script.py",  
    source_directory=".",
    arguments=["--output_path", webscrape_ouput],
    outputs=[webscrape_ouput],
    compute_target=AmlCompute(ws, "webscrape-nb"),
    runconfig=rc,
    allow_reuse=False)

Please signup or login to give your own answer.

Click here to cancel reply.