Scala or Java Library for fixing malformed URIs
Does anyone know of a good Scala or Java library that c开发者_Go百科an fix common problems in malformed URIs, such as containing characters that should be escaped but aren't?
I've tested a few libraries, including the now legacy URIUtil of HTTPClient without feeling I found any viable solution. Typically, I've had enough success with this type of java.net.URI construct though:
/**
* Tries to construct an url by breaking it up into its smallest elements
* and encode each component individually using the full URI constructor:
*
* foo://example.com:8042/over/there?name=ferret#nose
* \_/ \______________/\_________/ \_________/ \__/
* | | | | |
* scheme authority path query fragment
*/
public URI parseUrl(String s) throws Exception {
URL u = new URL(s);
return new URI(
u.getProtocol(),
u.getAuthority(),
u.getPath(),
u.getQuery(),
u.getRef());
}
which may be used combination with the following routine. It repeatedly decodes an URL
until the decoded string doesn't change, which can be useful against e.g., double encoding. Note, to keep it simple, this sample doesn't feature any failsafe etc.
public String urlDecode(String url, String encoding) throws UnsupportedEncodingException, IllegalArgumentException {
String result = URLDecoder.decode(url, encoding);
return result.equals(url) ? result : urlDecode(result, encoding);
}
I would advise against using java.net.URLEncoder
for percent encoding URIs. Despite the name, it is not great for encoding URLs as it does not follow the rfc3986 standard and instead encodes to the application/x-www-form-urlencoded
MIME format (read more here)
For encoding URIs in Scala I would recommend the Uri class from spray-http. scala-uri is an alternative (disclaimer: I'm the author).
精彩评论