TypeError: XXXXX got an unexpected keyword argument 'XXXXXX' - SEO

sirimiri
November 10, 2021
45 views
0 votes
2 Answers

I’m getting an unexpected keyword argument from running a code. Source : https://sempioneer.com/python-for-seo/how-to-extract-text-from-multiple-webpages-in-python/
Anybody can help ? thanks

running below code :

single_url = 'https://understandingdata.com/'
text = extract_text_from_single_web_page(url=single_url)
print(text)

gives below error :

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~AppDataLocalTemp/ipykernel_10260/3606377172.py in <module>
      1 single_url = 'https://understandingdata.com/'
----> 2 text = extract_text_from_single_web_page(url=single_url)
      3 print(text)

~AppDataLocalTemp/ipykernel_10260/850098094.py in extract_text_from_single_web_page(url)
     42     try:
     43         a = trafilatura.extract(downloaded_url, json_output=True, with_metadata=True, include_comments = False,
---> 44                             date_extraction_params={'extensive_search': True, 'original_date': True})
     45     except AttributeError:
     46         a = trafilatura.extract(downloaded_url, json_output=True, with_metadata=True,

TypeError: extract() got an unexpected keyword argument 'json_output'

the code for "extract_text_from_single_web_page(url=single_url)

def extract_text_from_single_web_page(url):
    
    downloaded_url = trafilatura.fetch_url(url)
    try:
        a = trafilatura.extract(downloaded_url, json_output=True, with_metadata=True, include_comments = False,
                            date_extraction_params={'extensive_search': True, 'original_date': True})
    except AttributeError:
        a = trafilatura.extract(downloaded_url, json_output=True, with_metadata=True,
                            date_extraction_params={'extensive_search': True, 'original_date': True})
    if a:
        json_output = json.loads(a)
        return json_output['text']
    else:
        try:
            resp = requests.get(url)
            # We will only extract the text from successful requests:
            if resp.status_code == 200:
                return beautifulsoup_extract_text_fallback(resp.content)
            else:
                # This line will handle for any failures in both the Trafilature and BeautifulSoup4 functions:
                return np.nan
        # Handling for any URLs that don't have the correct protocol
        except MissingSchema:
            return np.nan

Answers

- Samwise
- November 10, 2021 at 1:23 am
- 0 votes
0
As suggested in my comment, the best option is to find a tutorial that doesn’t use trafilatura, since that seems to be the thing that’s broken. However, it’s pretty simple to modify this particular function to avoid it and just use the fallback:
```
def extract_text_from_single_web_page(url):
    try:
        resp = requests.get(url)
        # We will only extract the text from successful requests:
        if resp.status_code == 200:
            return beautifulsoup_extract_text_fallback(resp.content)
        else:
            # This line will handle for any failures in the BeautifulSoup4 function:
            return np.nan
    # Handling for any URLs that don't have the correct protocol
    except MissingSchema:
        return np.nan
```
Login or Signup to reply.

- Chiel
- November 29, 2021 at 9:16 am
- 0 votes
0
Besides I agree with Samwise to try to stick with standard well-supported Python modules, I think there is a lesson here on version management.

In the tutorial you provided, they just install the latest version of the packages. This is in general not good practice. Especially in production environments you want to have control over the versions, so you don’t end up breaking your code because someone else made a change in your dependency.

In your case, trafilatura version 0.7.0 still supports the json_output keyword argument, but later versions have dropped this. For example, the latest version at time of writing: 0.9.3.

Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

TypeError: XXXXX got an unexpected keyword argument 'XXXXXX' – SEO

Answers