Secure XSS cleaning function (updated regularly) [duplicate]

2023-03-13 11:58 问答作者：

This question already has answers here: How to prevent XSS with HTML/PHP? (9 answers) Closed 24 days ago.

I've been hunting around the net now for a few days trying to figure this out but getting conflicting answers.

Is there a library, class or function for PHP that securely sanitizes/encodes a string against XSS? It needs to be updated regularly to counter new attacks.

I have a few use cases:

Use case 1) I have a plain text field, say for a First Name or Last Name

User enters text into field and submits the form
Before this is saved to the database I want to a) trim any whitespace off the front and end of the string, and b) strip all HTML tags from the input. It's a name text field, they shouldn't have any HTML in it.
Then I will save this to the database with PDO prepared statements.

I'm thinking I could just do trim() and strip_tags() then use a Sanitize Filter or RegEx with a whitelist of characters. Do they really need characters like ! and ? or < > in their name, not really.

Use case 2) When outputting the contents from a previously saved database record (or from a previously submitted form) to the View/HTML I want to thoroughly clean it for XSS. NB: It may or may not have gone through the filtering step in use case 1 as it could be a different type of input, so assume no sanitizing has been done.

Initially I though HTMLPurifier would do the job, but as it seems it is not what I need when I posed the question to their support:

Here's the litmus test: if a user submits foo should it show up as foo or foo? If the former, you don't need HTML Purifier.

So I'd rather it showed up as foo because I don't want any HTML displayed for a simple text field or any JavaScript executing.

So I've been hunting around for a function that will do it all for me. I st开发者_运维知识库umbled across the xss_clean method used by Kohana 3.0 which I'm guessing works but it's only if you want to keep the HTML. It's now deprecated from Kohana 3.1 as they've replaced it with HTMLPurifier. So I'm guessing you're supposed to do HTML::chars() instead which only does this code:

public static function chars($value, $double_encode = TRUE)
{
    return htmlspecialchars( (string) $value, ENT_QUOTES, Kohana::$charset, $double_encode);
}

Now apparently you're supposed to use htmlentities instead as mentioned in quite a few places in Stack Overflow because it's more secure than htmlspecialchars.

So how do I use htmlentities properly?
Is that all I need?
How does it protect against hex, decimal and base64 encoded values being sent from the attacks listed here?

Now I see that the 3rd parameter for the htmlentities method is the charset to be used in conversion. Now my site/db is in UTF-8, but perhaps the form submitted data was not UTF-8 encoded, maybe they submitted ASCII or HEX so maybe I need to convert it to UTF-8 first? That would mean some code like:

$encoding = mb_detect_encoding($input);
$input = mb_convert_encoding($input, 'UTF-8', $encoding);
$input = htmlentities($input, ENT_QUOTES, 'UTF-8');

Yes or no? Then I'm still not sure how to protect against the hex, decimal and base64 possible XSS inputs...

If there's some library or open source PHP framework that can do XSS protection properly I'd be interested to see how they do it in code.

Any help much appreciated, sorry for the long post!

To answer the bold question: Yes, there is. It's called htmlspecialchars.

It needs to be updated regularly to counter new attacks.

The right way to prevent XSS attacks is not countering specific attacks, filtering/sanitizing data, but proper encoding, everywhere.

htmlspecialchars (or htmlentities) in conjunction with a reasonable decision of character encoding (i.e. UTF-8) and explicit specification of character encoding is sufficient to prevent against all XSS attacks. Fortunately, calling htmlspecialchars without explicit encoding(it then assumes ISO-8859-1) happens to work out for UTF-8, too. If you want to make that explicit, create a helper function:

// Don't forget to specify UTF-8 as the document's encoding
function htmlEncode($s) {
    return htmlspecialchars($s, ENT_QUOTES, 'UTF-8');
}

Oh, and to address the form worries: Don't try to detect encodings, it's bound to fail. Instead, give out the form in UTF-8. Every browser will send user inputs in UTF-8 then.

Addressing specific concerns:

(...) you're supposed to use htmlentities because htmlspecialchars is vulnerable to UTF-7 XSS exploit.

The UTF-7 XSS exploit can only be applied if the browser thinks a document is encoded in UTF-7. Specifying the document encoding as UTF-8 (in the HTTP header/a meta tag right after <head>) prevents this.

Also if I don't detect the encoding, what's to stop an attacker downloading the html file, then altering it to UTF-7 or some other encoding, then submitting the POST request back to my server from the altered html page?

This attack scenario is unnecessarily complex. The attacker could just craft a UTF-7 string, no need to download anything.

If you accept the attacker's POST (i.e. you're accepting anonymous public user input), your server will just interpret the UTF-7 string as a weird UTF-8 one. That is not a problem, the attacker's post will just show garbled. The attacker could achieve the same effect (sending strange text) by submitting "grfnlk" a hundred times.

If my method only works for UTF-8 then the XSS attack will get through, no?

No, it won't. Encodings are not magic. An encoding is just a way to interpret a binary string. For example, the string "ö" is encoded as (hexadecimal) 2B 41 50 59 in UTF-7 (and C3 B6 in UTF-8). Decoding 2B 41 50 59 as UTF-8 yields "+APY" - harmless, seemingly randomly characters.

Also how does htmlentities protect against HEX or other XSS attacks?

Hexadecimal data will be outputted as just that. An attacker sending "3C" will post a message "3C". "3C" can only become < if you actively try to interpret hexadecimal inputs otherwise, for example actively map them into unicode code points and then output them. That just means if you're accepting data in something but plain UTF-8 (for example base32-encoded UTF-8), you'll first have to unpack your encoding, and then use htmlspecialchars before including it between HTML code.

Lots of security engineers are recommending to use this library for this specific problem :

https://www.owasp.org/index.php/Category:OWASP_Enterprise_Security_API

继续阅读：filtering php sanitization security xss

Secure XSS cleaning function (updated regularly) [duplicate]

Addressing specific concerns:

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

Addressing specific concerns:

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？