encoding question in perl
I have an encoding question and would like to ask for help. I notice if I choose "UTF-8" as encoding, there are (at least) two double quotes "
and “
. But when I choose "ISO-8859-1" as the e开发者_运维百科ncoding, I see the latter double quote becomes ¡°
, or sometimes for example “
.
Could anyone please explain why this is the case? How can match “
and replace it with "
using regexp in perl?
Thanks a lot.
ISO-8859-1 is a one-byte-per-character encoding. The fancy Unicode double-quotes are not in the ISO-8859-1 character set. So what you are seeing is a multi-byte character represented as a sequence of ISO-8859-1 bytes.
To match these weird things, see the perlunicode man page, especially the \x{...} and \N{...} escape sequences.
To answer your question, try \x{201C} to match the Unicode LEFT DOUBLE QUOTATION MARK and \x{201D} to match the RIGHT DOUBLE QUOTATION MARK. You missed the latter in your question :-).
[update]
I should have provided my reference... Some nice gentleman in the UK has a page on ASCII and Unicode quotation marks. The plain vanilla ASCII/ISO-8859-1 double-quote is just called QUOTATION MARK.
May be this Old post
will help..
精彩评论