skip to Main Content

I am using transformers pipeline to perform sentiment analysis on sample texts from 6 different languages. I tested the code in my local Jupyterhub and it worked fine. But when I wrap it in a flask application and create a docker image out of it, the execution is hanging at the pipeline inference line and its taking forever to return the sentiment scores.

  • mac os catalina 10.15.7 (no GPU)
  • Python version : 3.8
  • Transformers package : 4.4.2
  • torch version : 1.6.0
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
model_name = "nlptown/bert-base-multilingual-uncased-sentiment"
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
classifier = pipeline('sentiment-analysis', model=model, tokenizer=tokenizer)
results = classifier(["We are very happy to show you the Transformers library.", "We hope you don't hate it."])
print([i['score'] for i in results])

The above code works fine in Jupyter notebook and it has provided me the expected result

[0.7495927810668945,0.2365245819091797]

So now if I create a docker image with flask wrapper its getting stuck at the results = classifier([input_data]) line and the execution is running forever.

My folder structure is as follows:

- src
    |-- app
         |--main.py
    |-- Dockerfile
    |-- requirements.txt

I used the below Dockerfile to create the image

FROM tiangolo/uwsgi-nginx-flask:python3.8
COPY ./requirements.txt /requirements.txt
COPY ./app /app
WORKDIR /app
RUN pip install -r /requirements.txt
RUN echo "uwsgi_read_timeout 1200s;" > /etc/nginx/conf.d/custom_timeout.conf

And my requirements.txt file is as follows:

pandas==1.1.5
transformers==4.4.2
torch==1.6.0

My main.py script look like this :

from flask import Flask, json, request, jsonify
import traceback
import pandas as pd
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline


app = Flask(__name__)
app.config["JSON_SORT_KEYS"] = False

model_name = 'nlptown/bert-base-multilingual-uncased-sentiment'
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
nlp = pipeline('sentiment-analysis', model=model_path, tokenizer=model_path)

@app.route("/")
def hello():
    return "Model: Sentiment pipeline test"


@app.route("/predict", methods=['POST'])
def predict():

    json_request = request.get_json(silent=True)
    input_list = [i['text'] for i in json_request["input_data"]]
    
    results = nlp(input_list)         ##########  Getting stuck here
    for result in results:
        print(f"label: {result['label']}, with score: {round(result['score'], 4)}")
    score_list = [round(i['score'], 4) for i in results]
    
    return jsonify(score_list)

if __name__ == "__main__":
    app.run(host='0.0.0.0', debug=False, port=80)

My input payload is of the form

{"input_data" : [{"text" : "We are very happy to show you the Transformers library."},
                 {"text" : "We hope you don't hate it."}]}

I tried looking into the transformers github issues but couldn’t find one. I execution works fine even when using the flask development server but it runs forever when I wrap it and create a docker image. I am not sure if I am missing any additional dependency to be included while creating the docker image.

Thanks.

2

Answers


  1. Flask uses port 5000. In creating a docker image, it’s important to make sure that the port is set up this way. Replace the last line with the following:

    app.run(host="0.0.0.0", port=int(os.environ.get("PORT", 5000)))
    

    Be also sure to import os at the top

    Lastly, in Dockerfile, add

    EXPOSE 5000
    CMD ["python", "./main.py"]
    
    Login or Signup to reply.
  2. I was having a similar issue. It seems that starting the app somehow polutes the memory of transformers models. Probably something to do with how Flask does threading but no idea why. What fixed it for me was doing the things that are causing trouble (loading the models) in a different thread.

    import threading
    
    
    def preload_models():
        "LOAD MODELS"
        return 0
    
    def start_app():
    
        app = create_app()
        register_handlers(app)
    
        preloading = threading.Thread(target=preload_models)
        preloading.start()
        preloading.join()
    
        return app
    

    First reply here. I would be really glad if this helps.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search