
Getting raw text from JTextPane

In my application, I use a JTextPane to display some log information. As I want to hightlight some specific lines in this text (for example the error messages), I set the contentType as "text/html". This way, I can format my text.

Now, I c开发者_运维技巧reate a JButton that copies the content of this JTextPane into the clipboard. That part is easy, but my problem is that when I call myTextPane.getText(), I get the HTML code, such as :


    <font color="#FFCC66"><b>foobar</b></font><br>

instead of getting only the raw content:


Is there a way to get only the content of my JTextPane in plain text? Or do I need to transform the HTML into raw text by myself?

No need to use the ParserCallback. Just use:

textPane.getDocument().getText(0, textPane.getDocument().getLength()) );

Based on the accepted answer to: Removing HTML from a Java String

MyHtml2Text parser = new MyHtml2Text();
try {
    parser.parse(new StringReader(myTextPane.getText()));
} catch (IOException ee) {
  //handle exception

Slightly modified version of the Html2Text class found on the answer I linked to

import java.io.IOException;
import javax.swing.text.html.*;
import javax.swing.text.html.parser.*;

public class MyHtml2Text extends HTMLEditorKit.ParserCallback {
    StringBuffer s;
    public MyHtml2Text() {}
    public void parse(Reader in) throws IOException {
        s = new StringBuffer();
        ParserDelegator delegator = new ParserDelegator();
        delegator.parse(in, this, Boolean.TRUE);
    public void handleText(char[] text, int pos) {
    public String getText() {
        return s.toString();

If you need a more fine-grained handling consider implementing more of the interface defined by HTMLEditorKit.ParserCallback

You need to do it yourself unfortunately. Imagine if some of the contents was HTML specific, eg images - the text representation is unclear. Include alt text or not for instance.

(Is RegExp allowed? This isn't parsing, isn't it)

Take the getText() result and use String.replaceAll() to filter all tags. Than a trim() to remove leading and trailing whitespaces. For the whitespaces between your first and you last 'blabla' I don't see a general solution. Maybe you can spilt the rest around CRLF and trim all Strings again.

(I'm no regexp expert - maybe someone can provide the regexp and earn some reputation ;) )


.. I just assumed that you don't use < and > in your text - otherwise it.. say, it's a challenge.





