How to change case of a UTF file
I have a UTF file in uppercase, and I want to change all words to lowercase.
I have tried:
`tr '[:upper:]' '[:lower:]' < input.txt > output.txt`
开发者_开发问答
But that changes only characters without an accent.
Finally the simplest way I found is to use AWK:
awk '{print tolower($0)}' < input.txt > output.txt
This is because the default character classes only work on standard ASCII, which does not include most of the international accented characters. If you have a defined set of those characters, the easiest way would be to simply add the mapping from special uppercase character to special lowercase character manually:
tr 'ÄÖU[:upper:]' 'äöü[:lower:]'
If you only have a few accented characters, this is workable.
No, the issue is that tr
is not Unicode aware.
$ grep -o '[[:upper:]]' <<< JalapeÑo
J
Ñ
$ tr '[:upper:]' '[:lower:]' <<< JalapeÑo
jalapeÑo
The reason to use [:upper:]
, etc., is in order to handle characters outside ASCII. Otherwise, you could just use [A-Z]
and [a-z]
. That's also why PCRE has a character class called [:ascii:]]
:
$ perl -pe 's/[[:ascii:]]//g' <<< jalapeño
ñ
精彩评论