skip to Main Content

I’m trying to read an excel-file in python 3.6. Using the code below I managed to get HTTP 200 as status code for the request, could somebody help me to read the contents, too.

import requests

url=url="https://<myOrg>.sharepoint.com/:x:/s/x-taulukot/Ec0R1y3l7sdGsP92csSO-mgBI8WCN153LfEMvzKMSg1Zzg?e=6NS5Qh"
session_obj = requests.Session()
response = session_obj.get(url, headers={"User-Agent": "Mozilla/5.0"})
 
print(response.status_code)

When I go to the url in browser I get en excel-file, thus it should be en excel-file (although I don’t get it by curl or wget…)

There’s also some instructions in this page:

pd.read_csv produces HTTPError: HTTP Error 403: Forbidden

Edit:

using the test.py:

import pandas as pd
from urllib.request import Request, urlopen
url = "https://<myOrg>.sharepoint.com/:x:/s/x-taulukot/Ec0R1y3l7sdGsP92csSO-mgBI8WCN153LfEMvzKMSg1Zzg?e=6NS5Qh"
req = Request(url)
req.add_header('User-Agent', 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:77.0) Gecko/20100101 Firefox/77.0')

content = urlopen(req)
df = pd.read_csv(content)
print(df)

I get:

(venv) > python test.py
Traceback (most recent call last):
  File "test.py", line 8, in <module>
    df = pd.read_csv(content)
  File "/srv/work/miettinj/beta/python/venv/lib/python3.6/site-packages/pandas/io/parsers.py", line 688, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/srv/work/miettinj/beta/python/venv/lib/python3.6/site-packages/pandas/io/parsers.py", line 460, in _read
    data = parser.read(nrows)
  File "/srv/work/miettinj/beta/python/venv/lib/python3.6/site-packages/pandas/io/parsers.py", line 1198, in read
    ret = self._engine.read(nrows)
  File "/srv/work/miettinj/beta/python/venv/lib/python3.6/site-packages/pandas/io/parsers.py", line 2157, in read
    data = self._reader.read(nrows)
  File "pandas/_libs/parsers.pyx", line 847, in pandas._libs.parsers.TextReader.read
  File "pandas/_libs/parsers.pyx", line 862, in pandas._libs.parsers.TextReader._read_low_memory
  File "pandas/_libs/parsers.pyx", line 918, in pandas._libs.parsers.TextReader._read_rows
  File "pandas/_libs/parsers.pyx", line 905, in pandas._libs.parsers.TextReader._tokenize_rows
  File "pandas/_libs/parsers.pyx", line 2042, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 10, saw 4

2

Answers


  1. The answer simply wants you to put your url instead of the placeholder <YOUR URL WITH CSV>. Because you have a excel file and not a .csv, you can use pd.read_excel(). From the pandas docs:

    Supports xls, xlsx, xlsm, xlsb, odf, ods and odt file extensions read from a local filesystem or URL.

    The code could look like this:

    url = "https://PATH_TO_YOUR_EXCEL_FILE.xls"
    # annotation: don't post organization info on stackoverflow!
    req = Request(url)
    req.add_header('User-Agent', 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:77.0) Gecko/20100101 Firefox/77.0')
    
    content = urlopen(req)
    df = pd.read_excel(content)
    print(df)
    

    You could also experiment with passing the url directly to pd.read_excel(), but the documentation states that it only accepts http.

    Login or Signup to reply.
  2. For some reason, I can’t download an excel from SharePoint using urllib.request though it works if you use requests package! Nevertheless, you must append &download=1 to your url to get a direct download link:

    import pandas as pd
    import requests
    
    # Check the end of the url -->                                                                             HERE --v
    url = 'https://<myOrg>.sharepoint.com/:x:/s/x-taulukot/Ec0R1y3l7sdGsP92csSO-mgBI8WCN153LfEMvzKMSg1Zzg?e=6NS5Qh&download=1'
    headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/114.0'}
    
    resp = requests.get(url, headers=headers)
    df = pd.read_excel(resp.content, engine='openpyxl')
    

    Note: don’t forget to install openpyxl: pip install openpyxl

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search