Java: url encoding leaving 'allowed' character intact
Simple question from a Java novice. I want to encode a url so that nonstandard characters will be transformed to their he开发者_StackOverflowx value (that is %XX) while characters one expects to see in a url - letter, digits, forward slashes, question marks and whatever, will be left intact.
For example, encoding
"hi/hello?who=moris\\boris"
should result with
"hi/hello?who=moris%5cboris"
ideas?
OWASP Enterprise Security API provides solution for this.
Please visit following links for more details http://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Prevention_Cheat_Sheet#RULE_.235_-_URL_Escape_Before_Inserting_Untrusted_Data_into_HTML_URL_Parameter_Values
http://code.google.com/p/owasp-esapi-java/source/browse/trunk/src/main/java/org/owasp/esapi/codecs/PercentCodec.java
You can use below to escape special chars in URLs. However you need to pass the value only not the whole url
public static String escapeSpecialCharacters(String input) {
StringBuilder resultStr = new StringBuilder();
for (char ch : input.toCharArray()) {
if (isSafe(ch)) {
resultStr.append(ch);
} else{
resultStr.append('%');
resultStr.append(toHex(ch / 16));
resultStr.append(toHex(ch % 16));
}
}
return resultStr.toString();
}
private static char toHex(int ch) {
return (char) (ch < 10 ? '0' + ch : 'A' + ch - 10);
}
private static boolean isSafe(char ch) {
return ((ch>='A' && ch<='Z') || (ch>='a' && ch<='z') || (ch>='0' && ch<='9') || "-_.~".indexOf(ch)>=0);
}
This is actually, a rather tricky problem. And the reason that it is tricky is that the different parts of a URL need to be handled (encoded) differently.
In my experience, the best way to do this is to assemble the url from its components using the URL or URI class, letting the them take care of the encoding the components correctly.
In fact, now that I think about it, you have to encode the components before they get assembled. Once the parts are assembled it is impossible to tell whether (for example) a "?" is intended to the query separator (don't escape it) or a character in a pathname component (escape it).
org.apache.commons.codec.net.URLCodec will encode special characters (e.g. the \ as you indicated). However, you will likely need to break up the url as you don't want characters in the path encoded. Additionally, you will need to split up the parameter names and values since ? & and = need to remain intact to pass the parameters individually and not as one huge parameter name.
You can try spring UriUtils.This seems to be handling the URL encoding/decoding correctly for the appropriate parts of the URL.
http://docs.spring.io/spring/docs/current/javadoc-api/org/springframework/web/util/UriUtils.html
Use URLEncoder.encode(url, "UTF-8")
, see the Javadoc.
精彩评论