开发者

Regular expression fails on unicode

I'm trying to find the开发者_如何转开发 string "C#" in a text using php and reg exp.

I'm using

\bc\x{0023}\b

But doesn't work at all.

\bc\x{0023} 

works but that's not a solution for me

Any clue ?


It's because the escape sequence \b means a word boundary. Word is defined according to the PHP manual as: "A "word" character is any letter or digit or the underscore character, that is, any character which can be part of a Perl "word".".

Word boundary means the boundary between a word and a nonword. In otherwords, a between a character that is a word character and character is a not a word character. The problem is that # is not a word character. Thus, unless # is followed by a word character, #\b will never match.

Perhaps you should define more clearly using character classes what you want. For example /\bc#(?![a-z])/i (that is, C# that is not followed by a-z character range)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜