Feeling a bit crazy here. I’ve got Apache set up with mod_wsgi, but I can’t get the encoding to work properly. I have:
- tested that mod_wsgi is running in daemon mode
- read Graham Dumpleton’s blog post about setting up the
lang
andlocale
settings for theWSGIDaemonProcess
directive. - created a minimal test that seems to demonstrate the issue
# I recompiled the mod_wsgi file to get the Python version correct
sys.version = '3.8.6 (default, Sep 24 2020, 21:54:23) n[GCC 8.3.0]'
sys.prefix = '/usr/local'
sys.path = ['/usr/local/lib/python38.zip', '/usr/local/lib/python3.8', '/usr/local/lib/python3.8/lib-dynload', '/usr/local/lib/python3.8/site-packages', '/usr/local/src/scorched']
# This seems to be a timing thing? Not sure, but possibly problematic
locale.getlocale() = (None, None)
# This was fixed by setting lang or locale (not sure which)
locale.getdefaultlocale() = ('en_US', 'UTF-8')
sys.getdefaultencoding() = 'utf-8'
# These seem like a problem...
sys.getfilesystemencoding() = 'ascii'
locale.getpreferredencoding(False): 'ANSI_X3.4-1968'
# It's daemon mode
mod_wsgi.process_group = 'cl'
My WSGI configs look like this:
WSGIScriptAlias / /opt/courtlistener/docker/apache/wsgi-configs/python_version_test.py
WSGIDaemonProcess cl
threads=10
processes=64
python-path=/usr/local/lib/python3.8/site-packages/
lang='en_US.UTF-8'
locale='en_US.UTF-8'
WSGIProcessGroup cl
WSGIApplicationGroup %{GLOBAL}
WSGIPassAuthorization On
When I log into the server and start python
in the terminal, this line works fine, but it fails when it runs via mod_wsgi:
from reporters_db import REPORTERS
All that line does is import a json file that has some utf-8 content in it. Here’s the code behind that import:
db_root = os.path.dirname(os.path.realpath(__file__))
with open(os.path.join(db_root, "data", "reporters.json")) as f:
REPORTERS = json.load(f, object_hook=datetime_parser)
Since the json call above doesn’t have the encoding specified, it uses ASCII and fails:
Traceback (most recent call last):
File "/opt/courtlistener/docker/apache/wsgi-configs/python_version_test.py", line 6, in <module>
from reporters_db import REPORTERS
File "/usr/local/lib/python3.8/site-packages/reporters_db/__init__.py", line 22, in <module>
REPORTERS = json.load(f, object_hook=datetime_parser)
File "/usr/local/lib/python3.8/json/__init__.py", line 293, in load
return loads(fp.read(),
File "/usr/local/lib/python3.8/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 441720: ordinal not in range(128)
How can I tell it (and the rest of my codebase) to use utf-8 like sane adults?
Edit 1
Perhaps it is important to mention that I’m running apache with the following command:
exec apache2ctl -D FOREGROUND "$@"
I thought that would source the /etc/apache2/envvars
file, so I appended the following to that file:
export LANG="en_US.UTF-8"
And I tried tweaking my startup command to:
LANG="en_US.UTF-8" exec apache2ctl -D FOREGROUND "$@"
I was hopeful, but no. Still no progress.
2
Answers
Well, I finally figured this out by searching for every time Graham Dumpleton mentioned the word "lang" on the Internet. That eventually turned up this thread, which mentioned that it was possible to not have a locale installed. I was able to check that by running
locale -a
inside my Ubuntu Docker image, which revealed:So that's the issue!
mod_wsgi
doesn't know what I'm asking for when I ask foren_US.utf-8
, and it doesn't throw an error either. Swapping my settings to instead be set toC.UTF-8
fixed this immediately.I'm running a slim docker image, so that must be why I lack locales. I also don't have a file at
/etc/default/locale
that a lot of other answers in this general area refer to.I've filed this as a bug.
I had a similar UnicodeDecodeError issue when parsing a yaml file containing Unicode characters on Debian 11, Apache2, mod_wsgi.
It was enough to set WSGIDaemonProcess locale to C.UTF-8, then the error went. This single line changed in my /etc/apache2/sites-available/000-default.conf
In the question, mlissner mentioned a bunch of settings tried, but those were not needed for me.