开发者

String splitting

I have a string i开发者_如何学运维n what is the best way to put the things in between $ inside a list in java?

String temp = $abc$and$xyz$;

how can i get all the variables within $ sign as a list in java [abc, xyz]

i can do using stringtokenizer but want to avoid using it if possible. thx


Maybe you could think about calling String.split(String regex) ...


The pattern is simple enough that String.split should work here, but in the more general case, one alternative for StringTokenizer is the much more powerful java.util.Scanner.

    String text = "$abc$and$xyz$";
    Scanner sc = new Scanner(text);

    while (sc.findInLine("\\$([^$]*)\\$") != null) {
        System.out.println(sc.match().group(1));
    } // abc, xyz

The pattern to find is:

\$([^$]*)\$
  \_____/     i.e. literal $, a sequence of anything but $ (captured in group 1)
     1                 and another literal $

The […] is a character class. Something like [aeiou] matches one of any of the lowercase vowels. [^…] is a negated character class. [^aeiou] matches one of anything but the lowercase vowels.

(…) is used for grouping. (pattern) is a capturing group and creates a backreference.

The backslash preceding the $ (outside of character class definition) is used to escape the $, which has a special meaning as the end of line anchor. That backslash is doubled in a String literal: "\\" is a String of length one containing a backslash).

This is not a typical usage of Scanner (usually the delimiter pattern is set, and tokens are extracted using next), but it does show how'd you use findInLine to find an arbitrary pattern (ignoring delimiters), and then using match() to access the MatchResult, from which you can get individual group captures.

You can also use this Pattern in a Matcher find() loop directly.

    Matcher m = Pattern.compile("\\$([^$]*)\\$").matcher(text);
    while (m.find()) {
        System.out.println(m.group(1));
    } // abc, xyz

Related questions

  • Validating input using java.util.Scanner
  • Scanner vs. StringTokenizer vs. String.Split


Just try this one:temp.split("\\$");


I would go for a regex myself, like Riduidel said.

This special case is, however, simple enough that you can just treat the String as a character sequence, and iterate over it char by char, and detect the $ sign. And so grab the strings yourself.

On a side node, I would try to go for different demarkation characters, to make it more readable to humans. Use $ as start-of-sequence and something else as end-of-sequence for instance. Or something like I think the Bash shell uses: ${some_value}. As said, the computer doesn't care but you debugging your string just might :)

As for an appropriate regex, something like (\\$.*\\$)* or so should do. Though I'm no expert on regexes (see http://www.regular-expressions.info for nice info on regexes).


Basically I'd ditto Khotyn as the easiest solution. I see you post on his answer that you don't want zero-length tokens at beginning and end.

That brings up the question: What happens if the string does not begin and end with $'s? Is that an error, or are they optional?

If it's an error, then just start with:

if (!text.startsWith("$") || !text.endsWith("$"))
  return "Missing $'s"; // or whatever you do on error

If that passes, fall into the split.

If the $'s are optional, I'd just strip them out before splitting. i.e.:

if (text.startsWith("$"))
  text=text.substring(1);
if (text.endsWith("$"))
  text=text.substring(0,text.length()-1);

Then do the split.

Sure, you could make more sophisticated regex's or use StringTokenizer or no doubt come up with dozens of other complicated solutions. But why bother? When there's a simple solution, use it.

PS There's also the question of what result you want to see if there are two $'s in a row, e.g. "$foo$$bar$". Should that give ["foo","bar"], or ["foo","","bar"] ? Khotyn's split will give the second result, with zero-length strings. If you want the first result, you should split("\$+").


If you want a simple split function then use Apache Commons Lang which has StringUtils.split. The java one uses a regex which can be overkill/confusing.


You can do it in simple manner writing your own code. Just use the following code and it will do the job for you

import java.util.ArrayList; import java.util.List;

public class MyStringTokenizer {

/**
 * @param args
 */
public static void main(String[] args) {

    List <String> result = getTokenizedStringsList("$abc$efg$hij$");

    for(String token : result)
    {
        System.out.println(token);
    }

}

private static List<String> getTokenizedStringsList(String string) {

    List <String> tokenList = new ArrayList <String> ();

    char [] in = string.toCharArray();

    StringBuilder myBuilder = null;
    int stringLength = in.length;
    int start = -1;
    int end = -1;
    {
        for(int i=0; i<stringLength;)
        {
            myBuilder = new StringBuilder();
            while(i<stringLength && in[i] != '$')
                i++;
            i++;
            while((i)<stringLength && in[i] != '$')
            {

                myBuilder.append(in[i]);
                i++;
            }
            tokenList.add(myBuilder.toString());                
        }
    }
    return tokenList;
}

}


You can use

String temp = $abc$and$xyz$;
String array[]=temp.split(Pattern.quote("$"));
List<String> list=new ArrayList<String>();
for(int i=0;i<array.length;i++){
list.add(array[i]);
}

Now the list has what you want.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜