开发者

Why this regular expression shows safety?

I have a JSP redemption for XSS attacks, in which it checks if the content matches a regular expression to determine whether it is safe or not, here is the code:

String contents = bodyContent.getString();
String regExp = new String("^\\w{5,25}$");
// Do a regex to find the good stuff
if (contents.matches(regExp)) {
     //write the original content
}else{
     //change content to make it safe and write it
}

My question is about the regular expression "^\w{5,25}$", which you can see it here visually. Why matching this regular expression shows safe开发者_如何学运维ty?


If the regular expression was:

 ^\w{5,25}$

then this would limit the string to letters, numbers and underscores - i.e. no spaces or other punctuation. This means that it cannot be a nefarious script as that would surely include spaces, or semi-colons.


That railroad diagram is incorrect, "\w" is a regex special that matches so-called word characters. These are A-Z, a-z, 0-9 and underscores.

Input matching this is usually considered safe since it cannot include any of the normally used special or escape characters, but is by no means a guarantee.


Apart from the concrete question which has already been answered by others, that's a plain wrong way to prevent your JSPs from XSS attacks. You should be just using JSTL <c:out> tag or fn:escapeXml() function to redisplay user-controlled data.

E.g.

<c:out value="${header['user-agent']}" />

or

<input type="text" name="foo" value="${fn:escapeXml(param.foo)}" />

This way HTML/XML special characters like <, > and so on won't be interpreted literally (which would cause a potential XSS hole), but will be escaped so that they get just displayed as-is.

This is behind the scenes just done by a literal char-by-char match and replace. All < are replaced by &lt;, all > are replaced by &gt;, all " are replaced by &quot; and so on. This does and should not involve regex.


You're matching a number of "word" characters, anchored to start and end of string. So we know there's no punctuation other than _ in that set.

Anything matching this set is deemed safe, I guess that the authors assume that nothing evil can be done in such a string.

I can't understand why less that 5 characters is deemed unsafe.

I don't see why if a string of 25 such characters is safe, 26 is not.


Your regex validates that the string contains only the "word" character class, [a-Z0-9]. So, it is just validation that there is not punctuation or special characters in the string. It also validates for length, from 5 to 25.

An XSS attack commonly relies on a <script>...</script> routine getting inserted into the database - which obviously has a couple special characters [<>/].


The only reason I can think of why less than five characters would be "unsafe" is that if it was being used for a search query, 1 to 4 characters might return an excessive number of results. Many database-driven search functions require a minimum of 3-5 characters to avoid huge numbers of hits. Will this string be used for any sort of string matching?

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜