What does u mean in regex?
I came across this code and I'm at loss as to what u
means :
$todecode =~ s{
%u([Dd][89a-bA-B][0-9a-fA-F]{2}) # hi
%u([Dd][c-fC-开发者_Go百科F][0-9a-fA-F]{2}) # lo
}{
utf8_chr(
0x10000_
+ (hex($1) - 0xD800) * 0x400_
+ (hex($2) - 0xDC00)
)
}gex;
It's the letter between t
and v
. (It's matching a literal u
.)
It looks like somebody has some text with UTF-16 surrogate pairs written out as %uD800%uDC00
, and they're passing the decoded codepoint to the utf8_chr
function, and substituting the result of that.
精彩评论