PCRE: Different behaviour for \w on different servers
I'm using the routing system of Kohana for my own application, and when defining the pcre pattern for a tag of the url, my localhost behaves differently from the production server.
I have this route:
Route::set( 'list', 'list(/tagged/<tags>)',
array('tags'=>'[\w\d\-\+]+') );
This used to work fine, until the day someone used a tag that contained not "standard" characters (ñ). In my local开发者_运维知识库host there is no problem, but In production server the system is not able to found the route.
In production code I need to modify the pattern and explicitly add the 'ñ' to the allowed characters!
'\pL[\w\d\-\+ñ]+'
The question is, why? Ok, it works now that I added the 'ñ', but it is going to fail again sooner or later!
Have a look at the different Unicode character classes you can use here: http://www.regular-expressions.info/unicode.html#prop With that said, you will be able to use something like this:
Route::set('list', 'list(/tagged/<tags>)', array('tags'=>'[\p{L}\p{N}\-\+]+'));
\p{L}
any kind of letter from any language.\p{N}
any kind of numeric character in any script.
I've tested this out on ideone.com. View example.
Since the meaning of \w
is locale-dependent, your production server probably has a clean C locale, whereas your development system includes extended character codes.
IIRC using the /u
unicode modifier allows \w
to match all "letter" characters. If Kohana doesn't allow specifying modifiers, add it inline with (?u)[...]
. Or maybe in your case you only need to repeat \p{L}
within the square brackets:
'\pL[\w\d\-\+\p{L}]+'
精彩评论