How can you parse the string which has a text qualifier
How can I parse a String str = "abc, \"def,ghi\""
;
such that I get the output as
String[] strs = {"abc", "\"def,ghi\""}
i.e. an array of length 2.
Should I use regular expression or Is there any method in java api or anyother opensource
project which let me do this?
Edited
To give context about the problem, I am reading a text file which has a list of records one on each line. Each record has list of fields separated by delimiter(comma or semi-colon). Now I have a requirement where I have to support text qualifier some thing excel or open office supports. Suppose I have record
abc, "def,ghi"
In this , is my delimiter and " is my text qualifier such that when I parse this string I should get two开发者_JS百科 fields abc and def,ghi not {abc,def,ghi}
Hope this clears my requirement.
Thanks
Shekhar
The basic algorithm is not too complicated:
public static List<String> customSplit(String input) {
List<String> elements = new ArrayList<String>();
StringBuilder elementBuilder = new StringBuilder();
boolean isQuoted = false;
for (char c : input.toCharArray()) {
if (c == '\"') {
isQuoted = !isQuoted;
// continue; // changed according to the OP comment - \" shall not be skipped
}
if (c == ',' && !isQuoted) {
elements.add(elementBuilder.toString().trim());
elementBuilder = new StringBuilder();
continue;
}
elementBuilder.append(c);
}
elements.add(elementBuilder.toString().trim());
return elements;
}
This question seems appropriate: Split a string ignoring quoted sections
Along that line, http://opencsv.sourceforge.net/ seems appropriate.
Try this -
String str = "abc, \"def,ghi\"";
String regex = "([,]) | (^[\"\\w*,\\w*\"])";
for(String s : str.split(regex)){
System.out.println(s);
}
Try:
List<String> res = new LinkedList<String>();
String[] chunks = str.split("\\\"");
if (chunks.length % 2 == 0) {
// Mismatched escaped quotes!
}
for (int i = 0; i < chunks.length; i++) {
if (i % 2 == 1) {
res.addAll(Array.asList(chunks[i].split(",")));
} else {
res.add(chunks[i]);
}
}
This will only split up the portions that are not between escaped quotes.
Call trim() if you want to get rid of the whitespace.
精彩评论