Regular expression fails on unicode
I'm trying to find the开发者_如何转开发 string "C#" in a text using php and reg exp.
I'm using
\bc\x{0023}\b
But doesn't work at all.
\bc\x{0023}
works but that's not a solution for me
Any clue ?
It's because the escape sequence \b
means a word boundary. Word is defined according to the PHP manual as:
"A "word" character is any letter or digit or the underscore character, that is, any character which can be part of a Perl "word".".
Word boundary means the boundary between a word and a nonword. In otherwords, a between a character that is a word character and character is a not a word character. The problem is that #
is not a word character. Thus, unless #
is followed by a word character, #\b
will never match.
Perhaps you should define more clearly using character classes what you want. For example /\bc#(?![a-z])/i
(that is, C# that is not followed by a-z character range)
精彩评论