'git grep' and word boundaries on Mac OS X and BSD
I run git grep "\<blah\>"
regularly on my linux development server, but I just discovered that I am not able to use \<
and \>
on Mac (Mac OS X 10.6.8) (not able to use = it does not find anything).开发者_开发问答 Is the regular expressions syntax different in Mac?
I tried using git grep -E "\<blah\>"
but to no avail! :-(
After struggling with this, too, I found this very helpful post on a BSD mailing list. So here's the (albeit rather ugly) solution:
git grep "[[:<:]]blah[[:>:]]"
The -w
flag of git-grep also works but sometimes you want to only match the beginning or end of a word.
Update: This has changed in OS X 10.9 "Mavericks". Now you can use \<
, \>
, and \b
. [[:<:]]
and [[:>:]]
are no longer supported.
I guess it's caused by the BSD vs Linux grep library.
See if the -w
(match pattern only at word boundary) option to git grep does it for you:
$ git grep -w blah
You can compile git with PCRE
support and use git grep -P "\bblah\b"
for word boundaries.
Here's a guide on how to compile git using OSX Homebrew: http://realultimateprogramming.blogspot.com/2012/01/how-to-enable-git-grep-p-on-os-x-using.html
If you do use -P
, make sure to use Git 2.40 (Q1 2023): "grep -P
" learned to use Unicode Character Property to grok character classes when processing \b
and \w
etc.
See commit acabd20 (08 Jan 2023) by Carlo Marcelo Arenas Belón (carenas
).
(Merged by Junio C Hamano -- gitster
-- in commit 557d93a, 27 Jan 2023)
grep
: correctly identify utf-8 characters with\{b,w}
in-P
Signed-off-by: Carlo Marcelo Arenas Belón
Acked-by: Ævar Arnfjörð Bjarmason
When UTF is enabled for a PCRE match, the corresponding flags are added to the
pcre2_compile()
call, butPCRE2_UCP
wasn't included.This prevents extending the meaning of the character classes to include those new valid characters and therefore result in failed matches for expressions that rely on that extention, for ex:
$ git grep -P '\bÆvar'
Add
PCRE2_UCP
so that\w
will includeÆ
and therefore\b
could correctly match the beginning of that word.This has an impact on performance that has been estimated to be between 20% to 40% and that is shown through the added performance test.
That means those patterns will work, with any character:
'\bhow'
'\bÆvar'
'\d+ \bÆvar'
'\bBelón\b'
'\w{12}\b'
精彩评论