开发者

Bizzare multibyte preg_replace issue. It is changing my data to smily faces!

Using PHP 5.3.1 on windows.

I am just trying to add spaces between numbers and letters, but PHP is mangling my data!

$text = "TUES:8开发者_如何学Python:30AM-5:00PMTHURS:8:30AM-5:00PMSAT:8:00AM-1:00PM";
echo preg_replace("/([0-9]+)([A-Z]+)/","\1 \2",$text);
> TUES:8:☺ ☻AM-5:☺ ☻PMTHURS:8:☺ ☻AM-5:☺ ☻PMSAT:8:☺ ☻AM-1:☺ ☻PM

My file type ANSI, no there is no unicode in the source.

What the fun is going on here?


try using $ are your backreference indicator, not '\':

echo preg_replace("/(\d)(\w)/","$1 $2",$text);

I'm betting \1 is getting translated to something funky... notice the strange characters don't change between the minutes input being '30' and '00'

the php manual says you should double-escape your backreference, or use $ (if you are using a version 4.04 or newer)


You should use double backslash when you using them in string separated by double quotes:

echo preg_replace("/(\d)(\w)/","\\1 \\2",$text);


The \1 and \2 are being escaped by PHP, and being interpreted as ASCII codes 1 and 2, which in most standard Windows fonts show up as the two smiley faces you're seeing (when I run the same program on my Linux box, I get character code symbols 0001 and 0002 instead of the smiley faces).

If you want to actually use the regex replacement symbols, you need to do one of two things:

  1. Use single quotes for your regex strings, so that the slashes aren't used as escaping characters by PHP:

    preg_replace('/(\d)(\w)/','\1 \2',$text);
    
  2. Use double-quotes, but escape the slashes:

    preg_replace("/(\\d)(\\w)/","\\1 \\2",$text);
    

I'd suggest the single quote solution as it's easier to read.

Be aware that with double quotes, PHP escaping will always take precedence over regex escaping. This can affect both your regex pattern and the replacement strings. Many PHP escaped characters are the same for regex anyway - for example, \n will work the same in the regex pattern regardless of whether it is escaped by PHP or by regex. But there are some which do not work the same - as you've discovered - so you need to be careful.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜