开发者

Parsing XML with nodes containing HTML in Qt

I try to parse an XML file with some nodes containing HTML in Qt, it looks like this:

<root>
 <list>
  <element>Some <i>text<i></element>
  <element><b>开发者_如何学编程another line of text<b></element>
  <element><i>Tag opened here</element>
  <element>and closed here</i></element>
 </list>
</root>

I tried different approaches in Qt, but getting the HTML from the node was somehow not possible (in an easy way).

QDomDocument:

The only way I found to get the text of a QDomElement: Use the save() function (documentation), but then I would get the whole line "<element>...</element>", not just the inner text.

QXmlStreamReader

There is the function readElementText(QXmlStreamReader::IncludeChildElements) (documentation), but it removes the HTML tags, so the text of the first example would be only "Some text".

Can this be done in a more effective way?

I thought of another solution, what do you think about it:

How about wrapping the contents of the <element> tags in CDATA sections (using string replace or regex functions) before the xml file is parsed?


Neither QDomDocument nor QXmlStreamReader is able to parse HTML. They are XML parsers. To parse HTML in Qt you should use QtWebKit.

#include <QtCore>
#include <QtGui>
#include <QtWebKit>

int main(int argc, char ** argv)
{
    QApplication app(argc, argv);

    QString html =
    "                                                   \\
    <root>                                              \\
     <list>                                             \\
      <element>Some <i>text<i></element>                \\
      <element><b>another line of text<b></element>     \\
      <element><i>Tag opened here</element>             \\
      <element>and closed here</i></element>            \\
     </list>                                            \\
    </root>                                             \\
    ";

    QWebPage page;
    page.mainFrame()->setHtml(html);
    QWebElement htmlElement = page.mainFrame()->findFirstElement("root list element i");
    qDebug() << htmlElement.toPlainText();

    return app.exec();
}

Output:

"text"


The dom method of doing it should be nodeValue().

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜