skip to Main Content

Im facing a problem with umlauts in groovy/java on a ubuntu server.

This groovy code return for exists() false for files with umlauts:

def f1 = new File('/var/lib/jenkins/test/')
def files = [:]
f1.listFiles().each {
  files.put(it.name, it.getAbsoluteFile().exists())
}
println files
println 'file.encoding:' + System.getProperty('file.encoding')

Results in:

Verderblichkeit.docx:true
Gefa��hrlichkeit.docx:false
file.encoding:"iso-8859-1"

So it return false for a file it found itself with listFile(). That is wrong.

ls -al in the drirectory:

drwxr-xr-x  2 jenkins jenkins   4096 Jan  5 18:17 .
drwxr-xr-x 66 jenkins jenkins  12288 Jan  5 18:16 ..
-rw-r--r--  1 jenkins jenkins  98035 Jan  5 18:16 Gefährlichkeit.docx
-rw-r--r--  1 jenkins jenkins 277515 Jan  5 18:17 Verderblichkeit.docx

In linux I can copy or mv or rename the files and see the umlauts.

Environment:

  • Version of Java: Java(TM) SE Runtime Environment (build 1.8.0_131-b11)

Note: The original problem is getting the file path from a database. The file can be found and served throug nginx but in the java app (grails with groovy files) I get a false result for File.exists()

What can I do?

I tried setting UTF-8 as file.encoding by setting this in the application environment or by -D param on start. I searched the web but didn’t find a solution.

2

Answers


  1. Chosen as BEST ANSWER

    Solution

    The problem occured in different environments:

    1. development env: grails 4 application startet with gradle bootRun
    2. CI-stage with a tomcat 9 server
    3. production env: tomcat running in a docker container

    Short answer: The problem was the wrong settings for sun.jnu.encoding. Solution was to set it in the correct way for each env.

    Long answer: We had to set the java system property 'sun.jnu.encoding' in the different envs :

    1. dev env

    Set system properties in the bootRun section in build.gradle:

    bootRun {
        jvmArgs(
            '-Dsun.jnu.encoding=UTF-8',
            '-Dfile.encoding=UTF-8',
            ...)
    }
    

    2. tomcat 9 on server

    Set system properties in setenv.sh in tomcat/bin:

    export JAVA_OPTS="-Dsun.jnu.encoding=UTF-8 -Dfile.encoding=UTF-8 $JAVA_OPTS"
    

    3. tomcat 9 in docker container in prod env

    We used this solution https://stackoverflow.com/a/28406007/14748724. We need to rebuild the container image.

    Finally we had to set this in the docker-compose.yaml file:

    tomcat:
       environment:
          LC_ALL: 'en_US.UTF-8'
    

    Before it was LC_ALL: 'C', which was wrong.

    Note: Using the setenv.sh solution from env 2. didn't work in the container!


  2. This is not an answer as such, but it allows me to show the problem with Unicode composition and file names. Let’s create two files with the same name:

    goose@t410:/tmp$ touch $(echo -e 'x61xCCx88.txt')
    goose@t410:/tmp$ touch $(echo -e 'xC3xA4.txt')
    goose@t410:/tmp$ ls *.txt
    ä.txt  ä.txt
    

    What!? Hang on, this is a trick isn’t it? They are really the same file? Here’s proof they are different:

    goose@t410:/tmp$ ls -i *.txt
    

    131467 ä.txt 131527 ä.txt

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search