开发者

'git grep' and word boundaries on Mac OS X and BSD

I run git grep "\<blah\>" regularly on my linux development server, but I just discovered that I am not able to use \< and \> on Mac (Mac OS X 10.6.8) (not able to use = it does not find anything).开发者_开发问答 Is the regular expressions syntax different in Mac?

I tried using git grep -E "\<blah\>" but to no avail! :-(


After struggling with this, too, I found this very helpful post on a BSD mailing list. So here's the (albeit rather ugly) solution:

git grep "[[:<:]]blah[[:>:]]"

The -w flag of git-grep also works but sometimes you want to only match the beginning or end of a word.

Update: This has changed in OS X 10.9 "Mavericks". Now you can use \<, \>, and \b. [[:<:]] and [[:>:]] are no longer supported.


I guess it's caused by the BSD vs Linux grep library.

See if the -w (match pattern only at word boundary) option to git grep does it for you:

$ git grep -w blah


You can compile git with PCRE support and use git grep -P "\bblah\b" for word boundaries.

Here's a guide on how to compile git using OSX Homebrew: http://realultimateprogramming.blogspot.com/2012/01/how-to-enable-git-grep-p-on-os-x-using.html


If you do use -P, make sure to use Git 2.40 (Q1 2023): "grep -P" learned to use Unicode Character Property to grok character classes when processing \b and \w etc.

See commit acabd20 (08 Jan 2023) by Carlo Marcelo Arenas Belón (carenas).
(Merged by Junio C Hamano -- gitster -- in commit 557d93a, 27 Jan 2023)

grep: correctly identify utf-8 characters with \{b,w} in -P

Signed-off-by: Carlo Marcelo Arenas Belón
Acked-by: Ævar Arnfjörð Bjarmason

When UTF is enabled for a PCRE match, the corresponding flags are added to the pcre2_compile() call, but PCRE2_UCP wasn't included.

This prevents extending the meaning of the character classes to include those new valid characters and therefore result in failed matches for expressions that rely on that extention, for ex:

$ git grep -P '\bÆvar'

Add PCRE2_UCP so that \w will include Æ and therefore \b could correctly match the beginning of that word.

This has an impact on performance that has been estimated to be between 20% to 40% and that is shown through the added performance test.

That means those patterns will work, with any character:

'\bhow' 
'\bÆvar'
'\d+ \bÆvar'
'\bBelón\b'
'\w{12}\b'
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜