开发者

HTML::Entities encoding and single ampersand

I'm attempting to use the following line of perl, as described here: Does anyone know of a vim plugin or script to convert special characters to their corresponding HTML entities? - to encode HTML entities in Vim.

%!perl -p -i -e 'BEGIN { use HTML::Entities; use Encode; } $_=Encode::decode_utf8($_) unless Encode::is_utf8($_);  $_=Encode::encode("ascii", $_, sub{HTML::Entities::encode_entities(chr shift)});'

It works fine (£ to &pound, curly quotes etc.) except for an ampersand on it's own - & - which is left as it is.

I've tried removing the uf8 decoding, and looked at the CPAN documentation for HTML::Entities.

Answer:

@ZyX has answered the original question, but as others have pointed out in the comments, this is redundant as it's not actually necessary to use HTML entities if you are serving pages with a UTF-8 character set (which I am, both with the meta tag -

<meta charset="utf-8">

and also in the Apache configuration:

AddDefaultCharset utf-8

Indeed it's arguably a bad thing adding them in such cases; the filesize is bigger and the text is obfuscated should anyway want to make use of the source code.

It's essential you ensure whatever editor(s) you use 开发者_运维技巧to create files are writing them in UTF-8 as well.


My answer was only encoding characters that are above ascii range. If you want to encode something as html, you should use

$text=HTML::Entities::encode_entities($text);

:

%!perl -MHTML::Entities -MEncode -p -i -e '$_=Encode::decode_utf8($_) unless Encode::is_utf8($_); $_=HTML::Entities::encode_entities($_);'

I was not using this in that answer because TS only requested to encode unicode characters without encoding <, >, & as well.

By the way, you may use $text=HTML::Entities::encode_entities($text, '<>&"'); to encode only really unsafe characters (though I guess this is easily expressed with vimscript:

:let entities={'<': 'lt', '>': 'gt', '&': 'amp', '"': 'quot'}
:execute '%s/['.escape(join(keys(entities), ''), '\-]^').']/\="&".entities[submatch(0)].";"/g'


perl -MHTML::Entities -i -e 'print encode_entities shift'

should work, doesn't it?

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜