What's the difference between Word Boundaries and Start of String and End of String Anchors (Regex)?
Why are the two regular expressions evaluating the email differently in this example?
http://codepad.viper-7.com/SEgMzZ
<?php
$email = 'ΘΘΘme@gmail.com';
$regex = '#\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b#i';
$regex2 = '#^[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}$#i';
if (preg_match($regex, $email)) {
echo "A match was found.";
} else {
echo "A match was not found.";
}
i开发者_开发百科f (preg_match($regex2, $email)) {
echo "A match was found.";
} else {
echo "A match was not found.";
}
?>
EDIT: I expect both of these to NOT match
The problem is with your strange Θ
chars (U0398 Greek capital letter Theta). PHP is not considering them as being parts of a word, so there is a word boundary between ΘΘΘ
and me@...
.
The first regex matches since the rest of the string is ok.
The second doesn't match because those Θ
are not in the first character class, so your string doesn't match it.
As Wrikken points out, you can use the /u
(PCRE8) modifier in your regex to make PHP treat the string as UTF-8. The Theta letter will not introduce a word boundary in that case, and both expressions will fail to match.
精彩评论