Parse HTML Offline
Are there any HTML parsers that parse HTML docs offline, i.e. stored on your computer? If so, can anyone name some good ones please?
UPDATE: Hah, NVM, found the answer, would anyone be able to provide an example of this in html Jericho?
UPDATE2: I thought I had f开发者_Go百科ound the answer but I am wrong, mistook InputStream for FileInputStream :(
Here's a few you could look at:
- For Python: BeautifulSoup
- For .NET: HTML Agility Pack
- For Java: TagSoup
How about HTML Parser.
Nutch has an HTML parser as a subcomponent. Javadoc here.
精彩评论