开发者

Grep regex doesn't work with Cygwin on Windows

I开发者_C百科'm trying to find all non ascii chars in a file using grep:

grep '[^\x00-\x7F]' myfile

I think this should work but it returns each row in the file.

Any ideas?


grep doesn't recognize the \x syntax.

( echo Hello ; echo '\\x48' ) | grep '\x48'

prints

\x48

('H' is character 0x48.)

Your grep is matching all lines because each line contains a character other than \, x, 0, 7, F, and anything in the range 0 .. \.

Note that this is not specific to Cygwin.

GNU grep (which is what Cygwin has) has an experimental -P option that tells it to use Perl-like regular expressions; with that option, it does recognize the \x syntax.


Found that perl works:

perl -n -e 'print if /[^\x00-\x7F]/' file


Grep may be interpreting multibyte (i.e., non-ASCII) characters as several single-byte (ASCII) characters. (This way, this lovely character [U+2229] would show up as " [U+0022] followed by a ) [U+0029].) You'll need to figure out the file's encoding and use a more-sphisticated system that knows Unicode.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜