Laundering tainted data
When I do laundering tainted data with checking w开发者_开发问答hether it has any bad characters are there unicode-properties which will filter the bad characters?
User-Defined Character Properties in perlunicode
package Characters::Sid_com;
sub InBad {
return <<"BAD";
0000\t10FFFF
BAD
}
sub InEvil {
return <<"EVIL";
0488
0489
EVIL
}
sub InStupid {
return <<"STUPID";
E630\tE64F
F8D0\tF8FF
STUPID
}
⋮
die 'No.' if $tring =~ /
(?: \p{Characters::Sid_com::InBad}
| \p{Characters::Sid_com::InEvil}
| \p{Characters::Sid_com::InStupid}
)
/x;
I think "no" is an understatement for an answer, but there you have it. No, Unicode does not have a concept of "bad" or "good" characters (let alone "ugly" ones).
XML (and thus XHTML) can only contains these chars:
\x09 \x0A \x0D
\x{0020}-\x{D7FF}
\x{E000}-\x{FFFD}
\x{10000}-\x{10FFFF}
Of the above, the following should be avoided:
\x7F-\x84
\x86-\x9F
\x{FDD0}-\x{FDEF}
\x{1FFFE}-\x{1FFFF}
\x{2FFFE}-\x{2FFFF}
\x{3FFFE}-\x{3FFFF}
\x{4FFFE}-\x{4FFFF}
\x{5FFFE}-\x{5FFFF}
\x{6FFFE}-\x{6FFFF}
\x{7FFFE}-\x{7FFFF}
\x{8FFFE}-\x{8FFFF}
\x{9FFFE}-\x{9FFFF}
\x{AFFFE}-\x{AFFFF}
\x{BFFFE}-\x{BFFFF}
\x{CFFFE}-\x{CFFFF}
\x{DFFFE}-\x{DFFFF}
\x{EFFFE}-\x{EFFFF}
\x{FFFFE}-\x{FFFFF}
\x{10FFFE}-\x{10FFFF}
If you are generating XHTML, you need to escape the following:
&
⇒&
<
⇒<
>
⇒>
(optional)"
⇒"
(optional except in attribute values delimited with"
)'
⇒'
(optional except in attribute values delimited with'
)
HTML should have the same if not looser requirements, so if you stick to this, you should be safe.
精彩评论