Regarding Java Split Command Parsing Csv File
I have a csv file in the below format.
H,"TestItems_20100107.csv",07/01/2010,20:00:00,"TT1198","MOBb","AMD",NEW,,
I require the split command to ignore the commas inside the doubl开发者_开发知识库e quotes . So i used the below split command from an earlier post. Pasted the URL that i took this command
String items[] = line.split(",(?=([^\"]*\"[^\"]*\")*[^\"]*$)");
System.out.println("items.length"+items.length);
Java: splitting a comma-separated string but ignoring commas in quotes
When i run for this CSV data I am getting the items.length printed as 8. The last two commas at the end of line after "NEW" are ignored. I want the split command to pick up these commas and return me the length as 10. It's not picking up the null commas if it's in end but it's picking it up if it's in the middle of string. Not sure what i need to modify in the split command to resolve this issue. Also in the csv file Double quotes within the contents of a Text field can be repeated (e.g. "This account is a ""large"" one")
There's nothing wrong with the regular expression. The problem is that split discards empty matches at the end:
This method works as if by invoking the two-argument split method with the given expression and a limit argument of zero. Trailing empty strings are therefore not included in the resulting array.
A workaround is to supply an argument greater than the number of columns you expect in your CSV file:
String[] tokens = line.split(",(?=([^\"]*\"[^\"]*\")*[^\"]*$)", 99);
I came across this same problem today and found a simpe solution for csv files: adding an extra field containing just one space at the time the split is executed:
(line + ", ").split(",");
This way no matter how many consecutive empty fields may exist at the end of the csv file, split() will return always n+1 fields
Example session (using bsh)
bsh % line = "H,\"TestItems_20100107.csv\",07/01/2010,20:00:00,\"TT1198\",\"MOBb\",\"AMD\",NEW,,
bsh % System.out.println(line);
H,"TestItems_20100107.csv",07/01/2010,20:00:00,"TT1198","MOBb","AMD",NEW,,
bsh % String[] items = line.split(",(?=([^\"]*\"[^\"]*\")*[^\"]*$)");
bsh % System.out.println(items.length);
8
bsh % items = (line + ", ").split(",(?=([^\"]*\"[^\"]*\")*[^\"]*$)");
bsh % System.out.println(items.length - 1 );
10
bsh %
精彩评论