开发者

How to read html using c++ libraries in Linux?

Are there any C++ libraries available to read HTML开发者_如何学运维 in Linux?


libcurl is your friend + tidy (HTML tidy) if you've got broken HTML to fix.

Edit: Here is the full sequence

HTML (in file) -> tidy (which will clean up the malformed HTML) -> XSLT transformation (you'll need to provide an XSL file to translate your HTML to latex), and use libxml/libxsl (http://xmlsoft.org/) -> latex document is then processed using latex (by forking out to latex the command) or if you want, you could download the source code for lyx and see how they do it (http://www.lyx.org/). Unfortunately the sequence is too complex to write into a single example, all I can give you is the sequence...


Have a look at the following:

  • htmlcxx
  • wxHTML

Also there was a similar question asked already.


Try http://xmlsoft.org/

libxml2 can parse HTML, is ANSI C and a lot of binding come with it.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜