开发者

How can one prevent double encoding of html entities when they are allowed in the input

How can I prevent double encoding of html entities, or fix them programmatically?

I am using the encode() function from the HTML::Entities perl module to encode HTML entities in user input. The problem here is that we also allow users to input HTML entities directly and these entities end up being double encoded.

For example, a user may enter:

Stackoverflow & Perl = Awesome…

This ends up being encoded to

Stackoverflow & Perl = Awesome…

This renders in the browser as

Stackoverflow & Perl = Awesome…

We want this to render as

St开发者_如何学Cackoverflow & Perl = Awesome...

Is there a way to prevent this double encoding? Or is there a module or snippet of code that can easily correct these double encoding issues?

Any help is greatly appreciated!


You can decode the string first:

my $input = from_user();

my $encoded = encode_entities( decode_entities $input );


There is an extremely simple way to avoid this:

  1. Remove all the entities upon input (turn them into Unicode)
  2. Encode into entities again at the stage of output.


Consider saving the call to encode() until you retrieve the value for display, rather than before you store it. So long as you are consistent in your retrieval mechanism, the extra data in your database probably isn't worth fretting over.

Edit

Re-reading your question I realize now my answer doesn't fully address the issue seeing as calling encode() later will still have the same results. Not knowing of an alternative myself, it may not be much help, but you may want to consider finding a more suitable method for encoding that will respect existing symbols.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜