开发者

grep unicode 16 support

I use TextEdit on macosx created two files, same contents with different encodings, the开发者_运维百科n

grep xxx filename_UTF-16

nothing

grep xxx filename_UTF-8

xxxxxxx xxxxxxyyyyyy

grep did not support UTF-16?


iconv -f UTF-16 -t UTF-8 yourfile | grep xxx


You could always try converting first to utf-8:

iconv -f utf-16 -t utf-8 filename | grep xxxxx


Use ripgrep utility instead of grep which can support grepping UTF-16 files. Install by: brew install ripgrep.

Then run:

rg xxx filename_UTF-16

ripgrep supports searching files in text encodings other than UTF-8, such as UTF-16, latin-1, GBK, EUC-JP, Shift_JIS and more. (Some support for automatically detecting UTF-16 is provided. Other text encodings must be specifically specified with the -E/--encoding flag.)


Define the following Ruby's shell function:

grep16() { ruby -e "puts File.open('$2', mode:'rb:BOM|UTF-16LE').readlines.grep(Regexp.new '$1'.encode(Encoding::UTF_16LE))"; }

Then use it as:

grep16 xxx filename_UTF-16

See: How to use Ruby's readlines.grep for UTF-16 files?

For more suggestions, check: grepping binary files and UTF16


You could also use ugrep which supports UTF-8, UTF-16, UTF-32 and other file formats according to its readme:

ugrep searches UTF-encoded input when a UTF BOM (byte order mark). Option --encoding permits many other file formats to be searched, such as ISO-8859-1, EBCDIC, and code pages 437, 850, 858, 1250 to 1258.

ugrep matches Unicode patterns by default (disabled with option -U). The regular expression syntax is POSIX ERE compliant, extended with Unicode character classes, lazy quantifiers, and negative patterns to skip unwanted pattern matches to produce more precise results.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜