How to strip HTML from a text property of a Qt4 widget?
What's the easiest way in terms of coding effor开发者_运维技巧t, to change a text property of a Qt4 widget, e.g. QLabel.text, so all HTML tags are removed?
The HTML is simple, typically just one to three tags like or and their closing partners.
If you don't want to use a widget for that, you can use QTextDocument::toPlainText()
QTextDocument doc;
doc->setHtml(htmlText);
doc->toPlainText();
I've used this in the past, although the widget seems like overkill. QtextEdit, the rich text edit block. What makes this work is that the constructor assumes that the string has tags.
QTextEdit htmlText(HtmlText); // HtmlText is any QString with html tags.
QString plainText = htmlTextEdit.toPlainText();
It sounds like you are really just looking for a way to strip HTML tags from a string which is not something specific to Qt widgets (unless you want a solution that can take advantage of the rest of the Qt library). Anyway, there seems to be no shortage of hits when searching for "strip html from string". There seems to be 2 general approaches:
- Use a regular expression (here there be dragons)
- Use an html parser
You may find a regex that is good enough for your purposes but you will need a proper html parser to do it right.
This stackoverflow question has alot of discussion about the regex option (although the question is looking to strip all tags except links).
Since you are using Qt, this question has an answer with examples of using a parser from that library.
Why not peek under the hood in QTextEdit::toPlainText() source code, and see what is done there?
精彩评论