How to read html using c++ libraries in Linux?
Are there any C++ libraries available to read HTML开发者_如何学运维 in Linux?
libcurl is your friend + tidy (HTML tidy) if you've got broken HTML to fix.
Edit: Here is the full sequence
HTML (in file) -> tidy (which will clean up the malformed HTML) -> XSLT transformation (you'll need to provide an XSL file to translate your HTML to latex), and use libxml/libxsl (http://xmlsoft.org/) -> latex document is then processed using latex (by forking out to latex the command) or if you want, you could download the source code for lyx and see how they do it (http://www.lyx.org/). Unfortunately the sequence is too complex to write into a single example, all I can give you is the sequence...
Have a look at the following:
- htmlcxx
- wxHTML
Also there was a similar question asked already.
Try http://xmlsoft.org/
libxml2 can parse HTML, is ANSI C and a lot of binding come with it.
精彩评论