开发者

Parse HTML Offline

Are there any HTML parsers that parse HTML docs offline, i.e. stored on your computer? If so, can anyone name some good ones please?

UPDATE: Hah, NVM, found the answer, would anyone be able to provide an example of this in html Jericho?

UPDATE2: I thought I had f开发者_Go百科ound the answer but I am wrong, mistook InputStream for FileInputStream :(


Here's a few you could look at:

  • For Python: BeautifulSoup
  • For .NET: HTML Agility Pack
  • For Java: TagSoup


How about HTML Parser.


Nutch has an HTML parser as a subcomponent. Javadoc here.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜