Converting and validating url from untrusted source
I'm parsing web page and collecting hrefs. Because web page is untrusted source it can hold links with invalid syntax or non-ascii symbols. So, as I understand, I need
1) convert spaces and non-ascii symbols and other symbols
2) validate string that was produced by step 1 (validness crit开发者_开发技巧eria: this url can be typed in browser and it will be able to retrieve page represented by url, such url can be constructed by URL/URI constructors and than appropriate page retrieved - I can type some urls in firefox but can't construct instances in java)
3) construct java.net.URL/URI from (1) if it is valid
I had found two validation libraries: 1 and 2 (which one do you prefer?) but no adequate library for first clause (tools like java.net.URLDecoder/URLEncoder) aren't intended for this purpose.
Can't you just try to make an URL/URI from it in a try/catch statement? I think that class' constructor handles the validation automatically
精彩评论