Question posted in Json
Our archive of expertly curated questions and answers provides insights and solutions to common problems related to this popular data interchange format. From parsing and manipulating JSON data to integrating it with various programming languages and web services, our archive has got you covered. Start exploring today and take your JSON skills to the next level

Json – Fastapi displays different output than original function with HuggingFace transformers' NER

chancar
August 31, 2023
220 views
0 votes
2 Answers

I have created a API script with FastAPI to run a HF transformers’ NER model. However I am puzzled to see that the output returning from the API doesn’t match the output I get if run the model directly (the entities found are different). It is like the input text encoding is changed at some point, but can’t figure out where. My code:

main.py

from fastapi import FastAPI
from pydantic import BaseModel
from typing import List
from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline


app = FastAPI()

tokenizer1 = AutoTokenizer.from_pretrained("lcampillos/roberta-es-clinical-trials-ner")
model1 = AutoModelForTokenClassification.from_pretrained("lcampillos/roberta-es-clinical-trials-ner")


pipe1 = pipeline(task="ner", model=model1.to("cpu"), tokenizer=tokenizer1)


class PredictionInput(BaseModel):
    text: str


class NERPrediction(BaseModel):
    entity_group: str
    score: float
    word: str
    start: int
    end: int


class PredictionResult(BaseModel):
    predictions: List[NERPrediction]


@app.post("/predict/", response_model=PredictionResult)
async def predict(input_data: PredictionInput):
    # Perform NER inference
    ner_predictions = perform_prediction(input_data.text)

    # Prepare the response
    response = PredictionResult(predictions=ner_predictions)
    return response


def perform_prediction(input_text):

    # Perform NER inference using the provided text
    ner_results = pipe1(input_text)

    # Create NERPrediction instances and populate the list
    ner_predictions = []

    for result in ner_results:
        entity_group = result.get('entity_group') or result.get('entity')
        if entity_group:
            ner_prediction = NERPrediction(
                entity_group=entity_group,
                score=result.get('score', 0.0),
                word=result.get('word', ''),
                start=result.get('start', 0),
                end=result.get('end', 0)
            )
            ner_predictions.append(ner_prediction)

    return ner_predictions

a.py (code run to get response from API)

import requests
import urllib

text = "señor se presenta con fiebre y sudores fríos"

# Define the payload as a dictionary
url_encoded_text = urllib.parse.quote(text)
payload = {"text": url_encoded_text}

# Define the URL
url = "http://localhost:8000/predict/"

# Send the POST request with proper headers
headers = {"Content-Type": "application/json"}
response = requests.post(url, json=payload, headers=headers)

# Print the response
print(response.status_code)
print(response.json())

b.py (code run to check if the output is the same)

from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline

text = "señor se presenta con fiebre y sudores fríos"
tokenizer1 = AutoTokenizer.from_pretrained("lcampillos/roberta-es-clinical-trials-ner")

model1 = AutoModelForTokenClassification.from_pretrained("lcampillos/roberta-es-clinical-trials-ner")

pipe1 = pipeline(task="token-classification", model=model1.to("cpu"), binary_output=True, tokenizer=tokenizer1,
                 aggregation_strategy="average")


def perform_inference(txt):

    ner_output = pipe1(txt)

    return ner_output

print(perform_inference(text))

Output for a.py:

{'predictions': [{'entity_group': 'B-DISO', 'score': 0.9982337951660156, 'word': 'fiebre', 'start': 35, 'end': 41}, {'entity_group': 'B-DISO', 'score': 0.9964337348937988, 'word': 's', 'start': 48, 'end': 49}, {'entity_group': 'B-DISO', 'score': 0.9981486797332764, 'word': 'ud', 'start': 49, 'end': 51}, {'entity_group': 'B-DISO', 'score': 0.9970242381095886, 'word': 'ores', 'start': 51, 'end': 55}, {'entity_group': 'B-DISO', 'score': 0.9723742008209229, 'word': 'os', 'start': 66, 'end': 68}]}

Output for b.py:

[{'entity_group': 'DISO', 'score': 0.99900204, 'word': ' fiebre', 'start': 22, 'end': 28}, {'entity_group': 'DISO', 'score': 0.99870765, 'word': ' sudores fríos', 'start': 31, 'end': 44}]

I have tried to remove the URL-encoding and directly sending text as JSON to the API. This is indeed changes the output but still does not match the original output.

Many thanks in advance for your support.

Answers

Chosen as BEST ANSWER
- chancar
- August 31, 2023 at 3:48 pm
- 0 votes
0
Finally sorted by forwarding the raw text, as suggested by @Isabi, and rewriting the functions to:
```
@app.post("/predict/", response_model=PredictionResult)
async def predict(input_data):
            ner_predictions = perform_inference(input_data)
            response = PredictionResult(predictions=ner_predictions)
            return response

def perform_inference(input_text):
    ner_predictions = pipe1(input_text)
    return ner_predictions
```
In that way they work well with token-classification task.

(Edit)

- lsabi
- August 30, 2023 at 10:08 pm
- 0 votes
0
I haven’t used NLP’s LLM, but the pipe is different:

main.py
```
pipe1 = pipeline(task="ner", model=model1.to("cpu"), tokenizer=tokenizer1)
```
b.py
```
pipe1 = pipeline(task="token-classification", model=model1.to("cpu"), 
                 binary_output=True, tokenizer=tokenizer1,
                 aggregation_strategy="average")
```
task is different and the former lacks both aggregation_strategy and binary_output, which could lead to different results
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.