Are the PHP preg_functions multibyte safe?
There are no multibyte 'preg' functions available in PHP, so does that mean the default preg_functions are all mb safe?开发者_开发百科 Couldn't find any mention in the php documentation.
pcre supports utf8 out of the box, see documentation for the 'u' modifier.
Illustration (\xC3\xA4 is the utf8 encoding for the german letter "ä")
echo preg_replace('~\w~', '@', "a\xC3\xA4b");
this echoes "@@¤@" because "\xC3" and "\xA4" were treated as distinct symbols
echo preg_replace('~\w~u', '@', "a\xC3\xA4b");
(note the 'u') prints "@@@" because "\xC3\xA4" were treated as a single letter.
PCRE can support UTF-8 and other Unicode encodings, but it has to be specified at compile time. From the man page for PCRE 8.0:
The current implementation of PCRE corresponds approximately with Perl 5.10, including support for UTF-8 encoded strings and Unicode general category properties. However, UTF-8 and Unicode support has to be explicitly enabled; it is not the default. The Unicode tables correspond to Unicode release 5.1.
PHP currently uses PCRE 7.9; your system might have an older version.
Taking a look at the PCRE lib that comes with PHP 5.2, it appears that it's configured to support Unicode properties and UTF-8. Same for the 5.3 branch.
No, they are not. See the question preg_match and UTF-8 in PHP for example.
No, you need to use the multibyte string functions like mb_ereg
Some of my more complicated preg functions:
(1a) validate username as alphanumeric + underscore:
preg_match('/^[A-Za-z][A-Za-z0-9]*(?:_[A-Za-z0-9]+)*$/',$username)
(1b) possible UTF alternative:
preg_match('/^[A-Za-z][A-Za-z0-9]*(?:_[A-Za-z0-9]+)*$/u',$username)
(2a) validate email:
preg_match("/^([a-z0-9\+_\-]+)(\.[a-z0-9\+_\-]+)*@([a-z0-9\-]+\.)+[a-z]{2,6}$/ix",$email))
(2b) possible UTF alternative:
preg_match("/^([a-z0-9\+_\-]+)(\.[a-z0-9\+_\-]+)*@([a-z0-9\-]+\.)+[a-z]{2,6}$/ixu",$email))
(3a) normalize newlines:
preg_replace("/(\n){2,}/","\n\n",$str);
(3b) possible UTF alternative:
preg_replace("/(\n){2,}/u","\n\n",$str);
Do thse changes look alright?
精彩评论