Environment:
python 3.8.5
ipython 7.20.0
jupyterlab 3.0.7
ubuntu 20.04 LTS
pandas 1.2.2
openpyxl 3.0.10
googledrive (spreadsheet)
I’m doing…
import pandas as pd
ef = pd.ExcelFile('myfile.xlsx')
df = ef.parse(ef.sheet_names[0], headers=None)
display(df)
Parsing exported xlsx from google spreadsheet to dataframe.
The spreadsheet’s content is following next:
The Problem
It always parse A1(=1-1) to pd.Timestamp(2022.01.01 00:00:00).
But I want string value of "1-1".
I think it’s origin value already inserted to datetime type.
I tried
Most of SO’s solve is next.
So i tried that.
df1 = ef.parse(ef.sheet_names[0], headers=None)
df1.columns #=[0,1,2,3,4,5]
df = ef.parse(ef.sheet_names[0], headers=None, converters={c:str for c in df1.columns})
display(df.iloc[0][0])
But it shows string value but "2022-01-01 00:00:00"
Constraints
The spreadsheet’s writer(=operator) says to me "I typed exactly 1-1 on the spreadsheet"
And there are many spreadsheet writer.
So they won’t input ‘1-1 instead of 1-1, and strictly check it is really inserted string type or datetime type.
And google spreadsheet API (not drive api) returns it’s value ‘1-1’, so it works. But that API’s quota is too small (60 calls per 1min, and consumes 1 call per 1 sub-spreadsheet). So I must need google drive api and export it.
That’s why I can’t using spreadsheet API notwithstanding it’s actually works.
hope
I got xlsx exported file from google drive api like next way.
request = _google_drive_client.files().export_media(fileId=file_id,
mimeType='application/vnd.openxmlformats-officedocument.spreadsheetml.sheet')
file = io.BytesIO()
downloader = MediaIoBaseDownload(file, request)
done = False
while done is False:
status, done = downloader.next_chunk()
dn = file.getValue()
with open('test.xlsx', 'wb') as f:
f.write(dn)
And my Apple Numbers shows me the information of ‘1-1’ is still alive in that xlsx file.
So I hope I can find and make ‘1-1’ in python again.
Question
Is there any way xlsx file load to python memory that’s showing way? (I mean ‘1-1’, not following origin value (datetime type 2022 01 01 00:00:00), or other parsing way)
Or simply I want ‘1-1’ from parsing way.
Help me please!
3
Answers
I answer for my own question because I found a answer.
I stop using for pandas dataframe excel parser.
I tried many mimeTypes for googledrive API, finally I dropped odf and xlsx export.
Finally I using 'zip' that makes every sheet to html, and one css file.
I downloaded it zip, and extract, and finally find the html contains the contents it shows exact same with google spreadsheet.
My solution is next:
I appreciate for every answers!
And I hope someone for who using google drive API & parse it keep it's own showing way.
The problem is not pandas, of course. The problem is Excel. Unless told otherwise, it interprets
1-1
as a date. If you want to override that, start with a single quote:'1-1
. The quote won’t show, but Excel will treat the cell as a string.I hope I understood your question right. If you want to know how you can display
1-1
in it’s actual state as straight text after exporting it to excel, I think it’s best to use pandas with xlwings:The crucial point is to set the
NumberFormat
or rathernumber_format
property before loading the values ofdf
into the cells. This ensures that1-1
appears as straight text.As a side note:
ws.api.Cells.NumberFormat = "@"
changes the format of the whole sheet to text. If you prefer to change only a certain range, usews.range("A1:Z100").number_format = "@"
.Please also note that xlwings requires an excel installation.