开发者

PCRE: Different behaviour for \w on different servers

I'm using the routing system of Kohana for my own application, and when defining the pcre pattern for a tag of the url, my localhost behaves differently from the production server.

I have this route:

Route::set( 'list', 'list(/tagged/<tags>)',
            array('tags'=>'[\w\d\-\+]+') );

This used to work fine, until the day someone used a tag that contained not "standard" characters (ñ). In my local开发者_运维知识库host there is no problem, but In production server the system is not able to found the route.

In production code I need to modify the pattern and explicitly add the 'ñ' to the allowed characters!

'\pL[\w\d\-\+ñ]+'

The question is, why? Ok, it works now that I added the 'ñ', but it is going to fail again sooner or later!


Have a look at the different Unicode character classes you can use here: http://www.regular-expressions.info/unicode.html#prop With that said, you will be able to use something like this:

Route::set('list', 'list(/tagged/<tags>)', array('tags'=>'[\p{L}\p{N}\-\+]+'));
  1. \p{L} any kind of letter from any language.
  2. \p{N} any kind of numeric character in any script.

I've tested this out on ideone.com. View example.


Since the meaning of \w is locale-dependent, your production server probably has a clean C locale, whereas your development system includes extended character codes.

IIRC using the /u unicode modifier allows \w to match all "letter" characters. If Kohana doesn't allow specifying modifiers, add it inline with (?u)[...]. Or maybe in your case you only need to repeat \p{L} within the square brackets:

'\pL[\w\d\-\+\p{L}]+'
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜