开发者

How to split a string based on punctuation marks and whitespace?

I have a String that I want to split based on punctuation marks and whitespace开发者_运维百科. What should be the regex argument to the split() method?


Code with some weirdness-handling thrown in: (Notice that it skips empty tokens in the output loop. That's quick and dirty.) You can add whatever characters you need split and removed to the regex pattern. (tchrist is right. The \s thing is woefully implemented and only works in some very simple cases.)

public class SomeClass {
    public static void main(String args[]) {
        String input = "The\rquick!brown  - fox\t\tjumped?over;the,lazy\n,,..  \nsleeping___dog.";

        for (String s: input.split("[\\p{P} \\t\\n\\r]")){
            if (s.equals("")) continue;
            System.out.println(s);
        }
    }
}


INPUT:

The
quick!brown  - fox      jumped?over;the,lazy
,,..  
sleeping___dog.

OUTPUT:

The
quick
brown
fox
jumped
over
the
lazy
sleeping
dog


try something like this:

String myString = "item1, item2, item3";
String[] tokens = myString.split(", ");
for (String t : tokens){
            System.out.println(t);
        }

/*output
item1
item2
item3
*/


str.split(" ,.!?;") 

would be a good start for english. You need to improve it based on what you see in your data, and what language you're using.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜