Why this regular expression shows safety?

2023-03-25 12:59 问答作者：

I have a JSP redemption for XSS attacks, in which it checks if the content matches a regular expression to determine whether it is safe or not, here is the code:

String contents = bodyContent.getString();
String regExp = new String("^\\w{5,25}$");
// Do a regex to find the good stuff
if (contents.matches(regExp)) {
     //write the original content
}else{
     //change content to make it safe and write it
}

My question is about the regular expression "^\w{5,25}$", which you can see it here visually. Why matching this regular expression shows safe开发者_如何学运维ty?

If the regular expression was:

 ^\w{5,25}$

then this would limit the string to letters, numbers and underscores - i.e. no spaces or other punctuation. This means that it cannot be a nefarious script as that would surely include spaces, or semi-colons.

That railroad diagram is incorrect, "\w" is a regex special that matches so-called word characters. These are A-Z, a-z, 0-9 and underscores.

Input matching this is usually considered safe since it cannot include any of the normally used special or escape characters, but is by no means a guarantee.

Apart from the concrete question which has already been answered by others, that's a plain wrong way to prevent your JSPs from XSS attacks. You should be just using JSTL <c:out> tag or fn:escapeXml() function to redisplay user-controlled data.

E.g.

<c:out value="${header['user-agent']}" />

<input type="text" name="foo" value="${fn:escapeXml(param.foo)}" />

This way HTML/XML special characters like <, > and so on won't be interpreted literally (which would cause a potential XSS hole), but will be escaped so that they get just displayed as-is.

This is behind the scenes just done by a literal char-by-char match and replace. All < are replaced by <, all > are replaced by >, all " are replaced by " and so on. This does and should not involve regex.

You're matching a number of "word" characters, anchored to start and end of string. So we know there's no punctuation other than _ in that set.

Anything matching this set is deemed safe, I guess that the authors assume that nothing evil can be done in such a string.

I can't understand why less that 5 characters is deemed unsafe.

I don't see why if a string of 25 such characters is safe, 26 is not.

Your regex validates that the string contains only the "word" character class, [a-Z0-9]. So, it is just validation that there is not punctuation or special characters in the string. It also validates for length, from 5 to 25.

An XSS attack commonly relies on a <script>...</script> routine getting inserted into the database - which obviously has a couple special characters [<>/].

The only reason I can think of why less than five characters would be "unsafe" is that if it was being used for a search query, 1 to 4 characters might return an excessive number of results. Many database-driven search functions require a minimum of 3-5 characters to avoid huge numbers of hits. Will this string be used for any sort of string matching?

继续阅读：regex

Why this regular expression shows safety?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？