Ubuntu - make diff ignore case of umlauts

Alfe
July 7, 2024
198 views
0 votes
2 Answers

I need to make diff ignore the case of my inputs. Both inputs contain German umlauts like ä and Ä. Option -i successfully makes diff ignore the case of my input for other characters like a and A, but not for umlauts:

$ diff -i <(echo ä) <(echo Ä)
1c1
< ä
---
> Ä

The output should be empty, as ä and Ä should be seen as the same letter if case is ignored. If I try this instead:

$ diff -i <(echo a) <(echo A)

Then it works as expected (no output).

I also tried to set the environment variable LANG to make diff use the correct locale, but this didn’t seem to have any influence:

LANG=de_DE.UTF-8 diff -i <(echo ä) <(echo Ä)

I tried various values for LANG.

Is there a way to make diff ignore the case of German umlauts?

(I’m on Ubuntu 22.04 FWIW.)

Answers

- jhnc
- July 7, 2024 at 7:58 am
- 0 votes
0
A simple approach:
```
de-ascii()(
    sed '
        s/utf/utf_/g
        s/ä/utf{ae}/g
        s/Ä/utf{Ae}/g
    ' "$@"
)

ascii-de()(
    sed '
        s/utf{ae}/ä/g
        s/utf{Ae}/Ä/g
        s/utf_/utf/g
    ' "$@"
)

diff -i <(echo ä | de-ascii) <(echo Ä | de-ascii) | ascii-de
```
- select an escape sequence that won’t appear in normal diff output
  - (utf is probably not a good choice)
- de-ascii – transliterate appropriate characters
- ascii-de – undo transliteration
- encode inputs; diff; decode
- assumes a version of sed that correctly handles UTF-8
Login or Signup to reply.

- JosefZ
- July 7, 2024 at 10:31 pm
- 0 votes
0
Compare normalized strings, see Unicode normalization forms:
```
 diff -i <(echo ä| uconv -x Any-NFD) <(echo Ä| uconv -x Any-NFD)
```
Note: used uconv from sudo apt install icu-devtools

FYI:
```
Form   String StrLen Unicode
----   ------ ------ -------
NFC    äÄ          2 u00e4u00c4
NFD    äÄ          4 u0061u0308u0041u0308
NFKC   äÄ          2 u00e4u00c4
NFKD   äÄ          4 u0061u0308u0041u0308
```
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Ubuntu – make diff ignore case of umlauts

Answers