开发者

Splitting a CSV and excluding commas within elements

I've got a CSV string an I want to separate it into an array. However the CSV is a mix of strings and numbers where the strings are enclosed in quotes and may contain commas.

For e开发者_如何学运维xample, I might have a CSV as follows:

1,"Hello",2,"World",3,"Hello, World"

I would like it so the string is split into:

1
"Hello"
2
"World"
3
"Hello, World"

If I use String.Split(','); I get:

1
"Hello"
2
"World"
3
"Hello
World"

Is there an easy way of doing this? A library that is already written or do I have to parse the string character by character?


The "A Fast CSV Reader" article on Code Project. I've used it happily many times.


String.Split() is icky for this. Not only does it have nasty corner cases where it doesn't work like the one you just found (and others you haven't seen yet), but performance is less than ideal as well. The FastCSVReader posted by others will work, there's a decent csv parser built into the framework (Microsoft.VisualBasic.TextFieldParser), and I have a simple parser that behaves correctly posted to this question.


I would suggest using one of the following solutions, was just testing a few of them (hence the delay):-

  1. Regex matching commas not found within an enclosing double aprostophe
  2. A Fast CSV Reader - for read CSV only
  3. FileHelpers Library 2.0 - for read/write CSV

Hope this helps.


It's not the most elegant solution, but the quickest if you want to just quickly copy and paste code (avoiding having to import DLLs or other code libraries):

    private string[] splitQuoted(string line, char delimeter)
    {
        string[] array;
        List<string> list = new List<string>();
        do
        {
            if (line.StartsWith("\""))
            {
                line = line.Substring(1);
                int idx = line.IndexOf("\"");
                while (line.IndexOf("\"", idx) == line.IndexOf("\"\"", idx))
                {
                    idx = line.IndexOf("\"\"", idx) + 2;
                }
                idx = line.IndexOf("\"", idx);
                list.Add(line.Substring(0, idx));
                line = line.Substring(idx + 2);
            }
            else
            {
                list.Add(line.Substring(0, Math.Max(line.IndexOf(delimeter), 0)));
                line = line.Substring(line.IndexOf(delimeter) + 1);
            }
        }
        while (line.IndexOf(delimeter) != -1);
        list.Add(line);
        array = new string[list.Count];
        list.CopyTo(array);
        return array;
    }
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜