foo.split(',').length != number of ',' found in 'foo'?
Maybe it's because it's end of day on a Friday, and I have already found a work-around, but this is killing me.
I am using Java but am .NET developer.
I have a string and I need to split it on 开发者_如何转开发semicolon comma. Let's say its a row in a CSV file who has 200 210 columns. line.split(',').length will be sometimes, 199, where count of ',' will be 208 OR 209. I find count in 2 different ways even to be sure (using a regex, then manually looping through and checking the character after losing my sanity).
What's the super-obvious-hit-face-on-desk thing I'm missing here? Why isn't foo.split(delim).length == CountOfOccurences(foo,delim) all the time, only sometimes?
thanks much
First, there's an obvious difference of one. If there are 200 columns, all with text, there are 199 commas. Second, Java drops trailing empty strings by default. You can change this by passing a negative number as the second argument.
"foo,,bar,baz,,".split(",")
is:
{foo,,bar,baz}
an array of 4 elements. But
"foo,,bar,baz,,".split(",", -1)
is::
{foo,,bar,baz,,}
with all 6.
Note that only trailing empty strings are dropped by default.
Finally, don't forget that the String is compiled into a regex. This is not be applicable here, since ,
is not a special character, but you should keep it in mind.
There are a couple things happening. First, if you have three items like a,b,c and split on comma, you'll have three entries, one more than the number of commas.
But what you're dealing with probably comes from consecutive delimiters. : a,,,,b,c,,,,,
The ones at the end get dropped. Check the java documentation for the split function. http://download.java.net/jdk7/docs/api/java/lang/String.html
As others have pointed out, String.split
has some very non-intuitive behaviour.
If you're using Google's Guava open-source Java library, there's a Splitter
class which gives a much nicer (in my opinion) API for this, with more flexibility:
String input = "foo, bar,";
Splitter.on(',').split(input);
// returns "foo", " bar", ""
Splitter.on(',').omitEmptyStrings().split(input);
// returns "foo", " bar"
Splitter.on(',').omitEmptyStrings().trimResults().split(input);
// returns "foo", "bar"
Is it omitting blanks?
Do you have something like "a,b,c,,d,e" or trailing delimiters like "a,b,c,,,,"?
Are there extra delimiters in the cell data?
Short example: foo = "1,2"
and
foo.split(",").length = 2
count(foo, ",") = 1
Probably you have a mistake in your code. Here is an example in Java code:
String row = "1,2,3,4,,5"; // second example: 1,2,3,5,,
System.out.println(row.split(",").length); // print 6 in both cases
// code to count how many , you have in your row
Pattern patter = Pattern.compile(",");
Matcher m = patter.matcher(row);
int nr = 0;
while(m.find())
{
nr++;
}
System.out.println(nr); // print 5 for the first example and 6 for second
精彩评论