How to get web content before visit that web page
how to get description/content of web page for given URL. (Something like Google gives the short description of each resulting link). I want to do thi开发者_StackOverflow中文版s in my jsp page.
Thank in advance!
Idea: Open the URL as a stream, then HTML-parse the String in its description meta tag.
Grab URL content:
URL url = new URL("http://www.url-to-be-parsed.com/page.html");
BufferedReader in = new BufferedReader(
new InputStreamReader(
url.openStream()));
Will need to tweak the above code depending on what your HTML parser library requires (a stream, strings, etc).
HTML-Parse the tags:
<meta name="description" content="This is a place where webmasters can put a description about this web page" />
You might also be interested in grabbing the title of that page:
<title>This is the title of the page!</title>
Caution: Regular expressions do not seem to work reliably on HTML documents, so a HTML-parser is better.
An example with HTML Parser:
- Use
HasAttributeFilter
to filter by tags that havename="description"
attribute - try a
Node
--->MetaTag
casting - Get the
content
usingMetaTag.getAttribute()
Code:
import org.htmlparser.Node;
import org.htmlparser.Parser;
import org.htmlparser.util.NodeList;
import org.htmlparser.util.ParserException;
import org.htmlparser.filters.HasAttributeFilter;
import org.htmlparser.tags.MetaTag;
public class HTMLParserTest {
public static void main(String... args) {
Parser parser = new Parser();
//<meta name="description" content="Some texte about the site." />
HasAttributeFilter filter = new HasAttributeFilter("name", "description");
try {
parser.setResource("http://www.youtube.com");
NodeList list = parser.parse(filter);
Node node = list.elementAt(0);
if (node instanceof MetaTag) {
MetaTag meta = (MetaTag) node;
String description = meta.getAttribute("content");
System.out.println(description);
// Prints: "YouTube is a place to discover, watch, upload and share videos."
}
} catch (ParserException e) {
e.printStackTrace();
}
}
}
Considerations:
If this is done in a JSP each time the page is loaded, you might get a slowdown due to the network I/O to the URL. Even worse if you do this each time on-the-fly for a page of yours that has many URL links in it, then the slowdown could be massive due to the sequential operation of n URLs. Maybe you can store this information in a database and refresh them as needed instead of doing in it on-the-fly in the JSPs.
精彩评论