grep unicode 16 support
I use TextEdit on macosx created two files, same contents with different encodings, the开发者_运维百科n
grep xxx filename_UTF-16
nothing
grep xxx filename_UTF-8
xxxxxxx xxxxxxyyyyyy
grep did not support UTF-16?
iconv -f UTF-16 -t UTF-8 yourfile | grep xxx
You could always try converting first to utf-8:
iconv -f utf-16 -t utf-8 filename | grep xxxxx
Use ripgrep
utility instead of grep
which can support grepping UTF-16 files. Install by: brew install ripgrep
.
Then run:
rg xxx filename_UTF-16
ripgrep supports searching files in text encodings other than UTF-8, such as UTF-16, latin-1, GBK, EUC-JP, Shift_JIS and more. (Some support for automatically detecting UTF-16 is provided. Other text encodings must be specifically specified with the
-E
/--encoding flag.
)
Define the following Ruby's shell function:
grep16() { ruby -e "puts File.open('$2', mode:'rb:BOM|UTF-16LE').readlines.grep(Regexp.new '$1'.encode(Encoding::UTF_16LE))"; }
Then use it as:
grep16 xxx filename_UTF-16
See: How to use Ruby's readlines.grep for UTF-16 files?
For more suggestions, check: grepping binary files and UTF16
You could also use ugrep which supports UTF-8, UTF-16, UTF-32 and other file formats according to its readme:
ugrep searches UTF-encoded input when a UTF BOM (byte order mark). Option
--encoding
permits many other file formats to be searched, such as ISO-8859-1, EBCDIC, and code pages 437, 850, 858, 1250 to 1258.ugrep matches Unicode patterns by default (disabled with option
-U
). The regular expression syntax is POSIX ERE compliant, extended with Unicode character classes, lazy quantifiers, and negative patterns to skip unwanted pattern matches to produce more precise results.
精彩评论