Html code clearner
Is there any library or method to input a String with html code, and which has a return value another String without this html code, just the information???
I am watching libraries such JTidy, or HtmlParser, but I开发者_如何学Python don't know how to use it! Something easier???HTML Screen Scraping Tools Written in Java
This will remove all HTML tags from the given String.
String html = //...
html = html.replaceAll("</?.*?>", "");
But if you're looking to screen-scrape, you can use XPath to pull out specific parts of the HTML:
StreamSource source = new StreamSource(new StringReader(html));
DOMResult result = new DOMResult();
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.transform(source, result);
Node root = result.getNode();
XPath xpath = XPathFactory.newInstance().newXPath();
String value = xpath.evaluate("/the/xpath/expression", root);
精彩评论