开发者

Getting raw text from JTextPane

In my application, I use a JTextPane to display some log information. As I want to hightlight some specific lines in this text (for example the error messages), I set the contentType as "text/html". This way, I can format my text.

Now, I c开发者_运维技巧reate a JButton that copies the content of this JTextPane into the clipboard. That part is easy, but my problem is that when I call myTextPane.getText(), I get the HTML code, such as :

<html>
  <head>

  </head>
  <body>
    blabla<br>
    <font color="#FFCC66"><b>foobar</b></font><br>
    blabla
  </body>
</html>

instead of getting only the raw content:

blabla
foobar
blabla

Is there a way to get only the content of my JTextPane in plain text? Or do I need to transform the HTML into raw text by myself?


No need to use the ParserCallback. Just use:

textPane.getDocument().getText(0, textPane.getDocument().getLength()) );


Based on the accepted answer to: Removing HTML from a Java String

MyHtml2Text parser = new MyHtml2Text();
try {
    parser.parse(new StringReader(myTextPane.getText()));
} catch (IOException ee) {
  //handle exception
}
System.out.println(parser.getText());

Slightly modified version of the Html2Text class found on the answer I linked to

import java.io.IOException;
import javax.swing.text.html.*;
import javax.swing.text.html.parser.*;

public class MyHtml2Text extends HTMLEditorKit.ParserCallback {
    StringBuffer s;
    public MyHtml2Text() {}
    public void parse(Reader in) throws IOException {
        s = new StringBuffer();
        ParserDelegator delegator = new ParserDelegator();
        delegator.parse(in, this, Boolean.TRUE);
    }
    public void handleText(char[] text, int pos) {
        s.append(text);
        s.append("\n");
    }
    public String getText() {
        return s.toString();
    }
}

If you need a more fine-grained handling consider implementing more of the interface defined by HTMLEditorKit.ParserCallback


You need to do it yourself unfortunately. Imagine if some of the contents was HTML specific, eg images - the text representation is unclear. Include alt text or not for instance.


(Is RegExp allowed? This isn't parsing, isn't it)

Take the getText() result and use String.replaceAll() to filter all tags. Than a trim() to remove leading and trailing whitespaces. For the whitespaces between your first and you last 'blabla' I don't see a general solution. Maybe you can spilt the rest around CRLF and trim all Strings again.

(I'm no regexp expert - maybe someone can provide the regexp and earn some reputation ;) )

Edit

.. I just assumed that you don't use < and > in your text - otherwise it.. say, it's a challenge.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜