skip to Main Content

Environment:
python 3.8.5
ipython 7.20.0
jupyterlab 3.0.7
ubuntu 20.04 LTS
pandas 1.2.2
openpyxl 3.0.10
googledrive (spreadsheet)

I’m doing…

import pandas as pd
ef = pd.ExcelFile('myfile.xlsx')  
df = ef.parse(ef.sheet_names[0], headers=None)
display(df)

Parsing exported xlsx from google spreadsheet to dataframe.
The spreadsheet’s content is following next:

A1 shows 1-1 but actual value is 2022.01.01 00:00:00

The Problem
It always parse A1(=1-1) to pd.Timestamp(2022.01.01 00:00:00).
But I want string value of "1-1".
I think it’s origin value already inserted to datetime type.

I tried
Most of SO’s solve is next.
So i tried that.

df1 = ef.parse(ef.sheet_names[0], headers=None)
df1.columns #=[0,1,2,3,4,5]
df = ef.parse(ef.sheet_names[0], headers=None, converters={c:str for c in df1.columns})
display(df.iloc[0][0])

But it shows string value but "2022-01-01 00:00:00"

Constraints
The spreadsheet’s writer(=operator) says to me "I typed exactly 1-1 on the spreadsheet"
And there are many spreadsheet writer.
So they won’t input ‘1-1 instead of 1-1, and strictly check it is really inserted string type or datetime type.

And google spreadsheet API (not drive api) returns it’s value ‘1-1’, so it works. But that API’s quota is too small (60 calls per 1min, and consumes 1 call per 1 sub-spreadsheet). So I must need google drive api and export it.
That’s why I can’t using spreadsheet API notwithstanding it’s actually works.

hope
I got xlsx exported file from google drive api like next way.

request = _google_drive_client.files().export_media(fileId=file_id,
                                               mimeType='application/vnd.openxmlformats-officedocument.spreadsheetml.sheet')
file = io.BytesIO()
downloader = MediaIoBaseDownload(file, request)
done = False
while done is False:
    status, done = downloader.next_chunk()
dn = file.getValue()
with open('test.xlsx', 'wb') as f:
    f.write(dn)

And my Apple Numbers shows me the information of ‘1-1’ is still alive in that xlsx file.
enter image description here

So I hope I can find and make ‘1-1’ in python again.

Question
Is there any way xlsx file load to python memory that’s showing way? (I mean ‘1-1’, not following origin value (datetime type 2022 01 01 00:00:00), or other parsing way)
Or simply I want ‘1-1’ from parsing way.
Help me please!

3

Answers


  1. Chosen as BEST ANSWER

    I answer for my own question because I found a answer.

    I stop using for pandas dataframe excel parser.

    I tried many mimeTypes for googledrive API, finally I dropped odf and xlsx export.

    Finally I using 'zip' that makes every sheet to html, and one css file.

    I downloaded it zip, and extract, and finally find the html contains the contents it shows exact same with google spreadsheet.

    My solution is next:

    def extract_zip(input_zip):
        input_zip=ZipFile(input_zip)
        return {name: input_zip.read(name) for name in input_zip.namelist()}
    
    def read_json_from_zip_bytearray(file_bytearray):
        from bs4 import BeautifulSoup as soup
        from zipfile import ZipFile
        
        do = extract_zip(io.BytesIO(file_bytearray))
        
        dw = {k[:-5]:[[row.text for row in columns.find_all('td')] for columns in soup(v).find('table').find_all('tr') if columns.find_all('td')] for k, v in do.items() if k[-5:] == '.html'}
        
        return {k:[singlev for singlev in vv if singlev] for idx, vv in enumerate(v) for k, v in dw.items()}
    
    
    read_json_from_zip_bytearray(downloaded_bytearray_from_googledrive_api_and_zip_mimetype)
    
    #the value is
    {'sheetname1': ['1-1',
       '1-2',
       '1-3',
       '1-4',
       '1-5',
       'other...'],
     'sheetname2': ['2-1', '2-2', '2-3'],
       ...}
    
     #anyway you can make dataframe with pd.DataFrame() method
    

    I appreciate for every answers!

    And I hope someone for who using google drive API & parse it keep it's own showing way.


  2. The problem is not pandas, of course. The problem is Excel. Unless told otherwise, it interprets 1-1 as a date. If you want to override that, start with a single quote: '1-1. The quote won’t show, but Excel will treat the cell as a string.

    Login or Signup to reply.
  3. I hope I understood your question right. If you want to know how you can display 1-1 in it’s actual state as straight text after exporting it to excel, I think it’s best to use pandas with xlwings:

    import pandas as pd
    import xlwings as xw
    
    df = pd.DataFrame([
        (1, "1-1", 8),
        (4, 0, "=1-1")],
        columns=["A", "B", "C"]
        )
    
    with xw.App(visible=False) as app:
        wb = xw.Book()
        ws = wb.sheets[0]
        ws.api.Cells.NumberFormat = "@"  # Or instead: ws.range("A1:Z100").number_format = "@"
        ws["A1"].value = df
        wb.save("test.xlsx")
        wb.close()
    

    The crucial point is to set the NumberFormat or rather number_format property before loading the values of df into the cells. This ensures that 1-1 appears as straight text.

    As a side note: ws.api.Cells.NumberFormat = "@" changes the format of the whole sheet to text. If you prefer to change only a certain range, use ws.range("A1:Z100").number_format = "@".
    Please also note that xlwings requires an excel installation.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search