开发者

Splitting string on commas when data can contain commas

I have a CSV fi开发者_如何转开发le (which I didn't design and I can't change now nor will I ever be able to change it) that contains lines like the following:

"Surname, Firstname", yes, no, somestring, whatever, etc

As you can see here, the first , is not a comma on which I'd want to split the string. Notice that this particular comma is enclosed within the quotation marks.

Because of this, a simple string.split(',') obviously won't work, as it would give me an array of length 7 for the above string instead of 6.

Is there a way to get around this? I was thinking of using regex to split the string instead but I'm not competent enough in regex to think of a pattern that would only split on commas that are not enclosed inside quotation marks.

I can think of ugly, hacky ways to do it by reading each string char by char but this would have to be a last resort as I'm sure there's a better way to do it!


You can handle this easily by using the TextFieldParser class. Just set HasFieldsEnclosedInQuotes to true.


I would suggest using a CSV parser library - there are other cases that you wouldn't have thought of (new line as part of a quoted field).

The VisualBasic namespace has a nice library that can help - the TextFieldParser.


I know there's a lot of people here who think character-by-character comparisons should never be used and will strongly disagree with me but I'm not convinced companies like Microsoft aren't the only ones who should be doing that sort of programming.

Afterall, Split does character-by-character comparisons so why is it any less ugly when you call existing code that doesn't quite do exactly what you want?

At any rate, my approach was to write my own code. And I've posted the code online at http://www.blackbeltcoder.com/Articles/files/reading-and-writing-csv-files-in-c.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜