PCRE Encoding Support
I saw in the PCRE Documentation that PCRE support UTF-8 and Unicode general category properties, but i dont see where it say the Native encoding support.
开发者_开发问答If you say that support ISO-8859-1: where can i found info about that?
In A Nutshell:
Ive compared & im guessing that the encoding supported by PHP is windows-1252 and not the ISO-8859-1 encoding.
if(preg_match('/€/',"\x80"))
echo "Match";
ISO-8859-1 doesn't have the '€' in that position. Windows-1252 does. Or dependes of the system?
So wich is the native encoding PCRE Support?
Exactly this Example is used on regular-expressions.info to describe the difficulties from mixing 8bit and unicode
Mixing Unicode and 8-bit Character Codes
In short, the Euro symbol is on 80h
on all windows code pages. How your regex engine treats this may vary. It works when your regex engine is a 8bit and the text file is using a windows code page.
If your regex engine is a pure unicode one, it will read \x80 as \u0080 which is a control code.
So what do you mean with native encoding PCRE Support? This is system dependend and you should not rely on some code pages.
The advantage of unicode is that you can get rid of all the different code pages and all of the problems derived from that.
So to use unicode for that try matching for \x{20AC}
this is the unicode code point for the Euro symbol.
Here is an overview on regular-expressions.info about the unicode syntax
精彩评论