When I try to get the names of files from the uploaded archive, I get their names in this form »α«ß»Ñ¬Γ ê. ƒ¬«ó½Ñóáπ½¿µá ïÑ¡¿¡ß¬«ú« 諼߫¼«½á, ó αá⌐«¡Ñ ñ. 1, Æû îÆé µÑ¡Γα, ¡á ßóÑΓ«Σ«αÑ
. The archive was created on windows 10. When I collect the archive on ubuntu and collect the file names from the archive, there is no such problem.
How can this be fixed?
The archive was sent by the client. It is not clear how to repeat such an error locally
from zipfile import ZipFile
with ZipFile('arhive.zip') as myzip:
for name in myzip.namelist():
try:
uname = name.encode("IBM437").decode("utf-8")
except UnicodeDecodeError:
uname = name.encode("IBM437").decode("IBM866")
except UnicodeEncodeError as err:
uname = name
print(uname)
2
Answers
I decided to see how the file is decoded in the zipfile library. And I saw the following condition:
It only handles two cases:
But sometimes files from MacOs end up in the 2nd condition filename.decode('cp437'), although they should in the first one and we need to decode from cp437 to utf-8. Initially I did this
I found out that an already encoded file name can come into the function and then the function will return False. And the next names will not be decoded.
You face this mojibake case:
or