Trying to get just the URLs from output in Java
I'm new to Java, and have been looking for a solution.. perhaps i'm not searching on the right terminology.
My goal: I have a Java class that uses webdr开发者_开发问答iver to go to a page, perform a search... and output the results. The output results have plain text with URLs. All I care about are the URL's returned. So basically, I want to take my output like:
Search result 1
http://www.somesite.com/blahblah
this is a site from the search results.
but all I want is the URL, i want to dump the rest of the output. I've looked into 'parsing in java' but not finding what i'm looking for. Any pointers would be much appreciated.
Pattern pattern = Pattern.compile("http://[^\\s]*");
Matcher matcher = pattern
.matcher("Search result 1 http://www.somesite.com/blahbl+ah1 this is a site from the search results.\nSearch result 1 http://www.somesite.com/blahblah2 this is a site from the search results.");
for (int begin = 0; matcher.find(begin); begin = matcher.end())
{
System.out.println(matcher.group(0));
}
Check out the regex package: http://download.oracle.com/javase/1.4.2/docs/api/java/util/regex/package-summary.html
There are other ways to parse of course, but going the regexp route is probably the cleanest.
精彩评论