开发者

How to convert text file to lowercase in UNIX (but in UTF-8)

I need to convert all text to lowercase, but not using the traditional "tr" command because it does not handle UTF-8 languages properly.

Is there a nice way to do that? I need 开发者_StackOverflow社区some UNIX filter so I can process this in a pipe.


Gnu sed should be able to handle unicode. Try

$ echo 'Some StrAngÉ LeTTeRs 123' | sed -e 's/./\L\0/g'
some strangé letters 123


If you can use Python then such code can help you:

import sys
import codecs

utf8input = codecs.getreader("utf-8")(sys.stdin)
utf8output = codecs.getwriter("utf-8")(sys.stdout)

utf8output.write(utf8input.read().lower())

On my Windows machine (sorry :) I can use it as filter:

cat big.txt | python tolowerutf8.py > lower.txt3
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜