I am grepping (Ubuntu) multiples files with this command:
LANG=en_US.UTF-8 grep -P -R -i -I -H -A1 "^name#.*?r[AÀÁÂÃÄaàáâãä]f[AÀÁÂÃÄaàáâãä][EÈÉÊËeèéêë]l s[IÌÍÎÏiìíîï]m[OÔÒÓÕÖoòóôõö].*?#.*?#.*?#.*?#.*?$" image_args_*
Which returns a few results, this being of them:
image_args_search_134.txt:name#Rafael Simões Vieira#1767###Emerenciana Rodrigues de Oliveira image_args_search_134.txt-#bati.#134#somelinkhere.com## --
but if I add [EÈÉÊËeèéêë] as part of the operator like shown below:
LANG=en_US.UTF-8 grep -P -R -i -I -H -A1 "^name#.*?r[AÀÁÂÃÄaàáâãä]f[AÀÁÂÃÄaàáâãä][EÈÉÊËeèéêë]l s[IÌÍÎÏiìíîï]m[OÔÒÓÕÖoòóôõö][EÈÉÊËeèéêë].*?#.*?#.*?#.*?#.*?$" image_args_*
Then I get nothing.
Why is that?
Thanks!
2
Answers
I see your problem only, if i use some not installed locales.
Please verify that the required locales are activated (not commended out by leading
#
)(The file
/etc/locales.gen
can be configured to your needs by removing the commenting#
)Assure that these configured locales are really generated:
From what I notice, it looks like you would like to use all possible diacritics that fit a given letter. Within the concept of regular expressions, you can use equivalence classes.
So you might want to write something based on:
Note that this does not exist in PCRE, so you just need to use extended regular expressions: