Grep regex doesn't work with Cygwin on Windows
I开发者_C百科'm trying to find all non ascii chars in a file using grep:
grep '[^\x00-\x7F]' myfile
I think this should work but it returns each row in the file.
Any ideas?
grep
doesn't recognize the \x
syntax.
( echo Hello ; echo '\\x48' ) | grep '\x48'
prints
\x48
('H'
is character 0x48.)
Your grep
is matching all lines because each line contains a character other than \
, x
, 0
, 7
, F
, and anything in the range 0
.. \
.
Note that this is not specific to Cygwin.
GNU grep (which is what Cygwin has) has an experimental -P
option that tells it to use Perl-like regular expressions; with that option, it does recognize the \x
syntax.
Found that perl works:
perl -n -e 'print if /[^\x00-\x7F]/' file
Grep may be interpreting multibyte (i.e., non-ASCII) characters as several single-byte (ASCII) characters. (This way, this lovely ∩
character [U+2229] would show up as "
[U+0022] followed by a )
[U+0029].) You'll need to figure out the file's encoding and use a more-sphisticated system that knows Unicode.
精彩评论