Im facing a problem with umlauts in groovy/java on a ubuntu server.
This groovy code return for exists() false for files with umlauts:
def f1 = new File('/var/lib/jenkins/test/')
def files = [:]
f1.listFiles().each {
files.put(it.name, it.getAbsoluteFile().exists())
}
println files
println 'file.encoding:' + System.getProperty('file.encoding')
Results in:
Verderblichkeit.docx:true
Gefa��hrlichkeit.docx:false
file.encoding:"iso-8859-1"
So it return false for a file it found itself with listFile(). That is wrong.
ls -al in the drirectory:
drwxr-xr-x 2 jenkins jenkins 4096 Jan 5 18:17 .
drwxr-xr-x 66 jenkins jenkins 12288 Jan 5 18:16 ..
-rw-r--r-- 1 jenkins jenkins 98035 Jan 5 18:16 Gefährlichkeit.docx
-rw-r--r-- 1 jenkins jenkins 277515 Jan 5 18:17 Verderblichkeit.docx
In linux I can copy or mv or rename the files and see the umlauts.
Environment:
- Version of Java: Java(TM) SE Runtime Environment (build 1.8.0_131-b11)
Note: The original problem is getting the file path from a database. The file can be found and served throug nginx but in the java app (grails with groovy files) I get a false result for File.exists()
What can I do?
I tried setting UTF-8 as file.encoding by setting this in the application environment or by -D param on start. I searched the web but didn’t find a solution.
2
Answers
Solution
The problem occured in different environments:
Short answer: The problem was the wrong settings for sun.jnu.encoding. Solution was to set it in the correct way for each env.
Long answer: We had to set the java system property 'sun.jnu.encoding' in the different envs :
1. dev env
Set system properties in the bootRun section in build.gradle:
2. tomcat 9 on server
Set system properties in setenv.sh in tomcat/bin:
3. tomcat 9 in docker container in prod env
We used this solution https://stackoverflow.com/a/28406007/14748724. We need to rebuild the container image.
Finally we had to set this in the docker-compose.yaml file:
Before it was
LC_ALL: 'C'
, which was wrong.Note: Using the setenv.sh solution from env 2. didn't work in the container!
This is not an answer as such, but it allows me to show the problem with Unicode composition and file names. Let’s create two files with the same name:
What!? Hang on, this is a trick isn’t it? They are really the same file? Here’s proof they are different:
131467 ä.txt 131527 ä.txt