开发者

Html code clearner

Is there any library or method to input a String with html code, and which has a return value another String without this html code, just the information???

I am watching libraries such JTidy, or HtmlParser, but I开发者_如何学Python don't know how to use it! Something easier???


HTML Screen Scraping Tools Written in Java


This will remove all HTML tags from the given String.

String html = //...
html = html.replaceAll("</?.*?>", "");

But if you're looking to screen-scrape, you can use XPath to pull out specific parts of the HTML:

StreamSource source = new StreamSource(new StringReader(html));
DOMResult result = new DOMResult();
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.transform(source, result);
Node root = result.getNode();

XPath xpath = XPathFactory.newInstance().newXPath();

String value = xpath.evaluate("/the/xpath/expression", root);
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜