开发者

StringTokenizer split at "<br/>"

Maybe I am stupid but I don't understand why the behaviour of StringTokenizer here:

import static org.apache.commons.lang.StringEscapeUtils.escapeHtml;

String object = (String) value;
String escaped = escapeHtml(object);
StringTokenizer tokenizer = new StringTokenizer(escaped, escapeHtml("<br/>"));

If fx. value is

Hej<br/>$user.get(0).name Har vundet<br/><table border='1'><tr><th>Name</th><th>Played</th><th>Brewed</th></tr>开发者_JS百科;#foreach( $u in $user )<tr><td>$u.name</td> <td>$u.played</td> <td>$u.brewed</td></tr>#end</table><br/>

Then the result is

Hej
$use
.
e
(0).name Ha
 vunde
a
e 
o
de
='1'
h
Name
h
h
P
ayed
h
h
B
ewed
h
#fo
each( $u in $use
 )
d
$u.name
d

d
$u.p
ayed
d

d
$u.
ewed
d
#end
a
e

It makes no sense to me.

How can I make it behave as I expect to.


From the documentation:

The characters in the delim argument are the delimiters for separating tokens. Delimiter characters themselves will not be treated as tokens.

In other words, the characters that tell StringTokenizer when to separate the string are:

  • <
  • b
  • r
  • /
  • >

When it matches any of those characters in the string (the variable escaped in your code), the StringTokenizer instance will split the result and drop the token. You can confirm this by noting that the letter r does not occur in the output.

Use String.split, instead, as others suggest.


Each character in the string is considered a token for splitting on. So your code splits on each "&", "l", "t", ";", "b", "r", "/" or "g" character (since escapeHtml will replace the "<" and ">" with &lt; and &gt; respectively).

You probably want to use String.split which takes a regular expression as the thing to split on:

String[] parts = object.split("<br/>");

or

String[] parts = escaped.split(escapeHtml("<br/>"));

Just make sure that there are no regex special characters in your split token.


If you want to divide a string / text by a word and not only by few characters then you better use String.split

I've done the test:

public static void main(String[] args){
    String s = "Hej<br/>$user.get(0).name Har vundet<br/><table border='1'><tr><th>Name</th><th>Played</th><th>Brewed</th></tr>#foreach( $u in $user )<tr><td>$u.name</td> <td>$u.played</td> <td>$u.brewed</td></tr>#end</table><br/>";
    String[] lines = s.split("<br/>");
    for(String ss:lines)
        System.out.println(ss);
}

and here you have the result:

Hej
$user.get(0).name Har vundet
<table border='1'><tr><th>Name</th><th>Played</th><th>Brewed</th></tr>#foreach( $u in $user )<tr><td>$u.name</td> <td>$u.played</td> <td>$u.brewed</td></tr>#end</table>

Tjena


StringTokenizer splits using each character.

You need to use split. (be careful though as it takes a regular expression)

String[] lines = "some html string<br/>with line breaks<br/>".split("<br/>")


You cannot use StringTokenizer with a multicharacter delimiter. One possible solution to your problem is to replace "<br>" with a character that you can guarantee will not appear in your string, and then us StringTokenizer with that character as the delimiter.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜