开发者

regex to split line (csv file)

I am not good in regex. Can some one help me out to write regex for me?

I may have values like this while reading csv file.

"Artist,Name",Album,12-SCS
"val""u,e1",value2,value3

Output:

Artist,Name  
Album
12-SCS
Val"u,e1 
Value2 
Value3

Update: I like idea using Oledb provider. We do have file upload control on the web page, that I read the content of the file 开发者_如何学编程using stream reader without actual saving file on the file system. Is there any way I can user Oledb provider because we need to specify the file name in connection string and in my case i don't have file saved on file system.


Just adding the solution I worked on this morning.

var regex = new Regex("(?<=^|,)(\"(?:[^\"]|\"\")*\"|[^,]*)");

foreach (Match m in regex.Matches("<-- input line -->"))
{
    var s = m.Value; 
}

As you can see, you need to call regex.Matches() per line. It will then return a MatchCollection with the same number of items you have as columns. The Value property of each match is, obviously, the parsed value.

This is still a work in progress, but it happily parses CSV strings like:

2,3.03,"Hello, my name is ""Joshua""",A,B,C,,,D


Actually, its pretty easy to match CVS lines with a regex. Try this one out:

StringCollection resultList = new StringCollection();
try {
    Regex pattern = new Regex(@"
        # Parse CVS line. Capture next value in named group: 'val'
        \s*                      # Ignore leading whitespace.
        (?:                      # Group of value alternatives.
          ""                     # Either a double quoted string,
          (?<val>                # Capture contents between quotes.
            [^""]*(""""[^""]*)*  # Zero or more non-quotes, allowing 
          )                      # doubled "" quotes within string.
          ""\s*                  # Ignore whitespace following quote.
        |  (?<val>[^,]*)         # Or... zero or more non-commas.
        )                        # End value alternatives group.
        (?:,|$)                  # Match end is comma or EOS", 
        RegexOptions.Multiline | RegexOptions.IgnorePatternWhitespace);
    Match matchResult = pattern.Match(subjectString);
    while (matchResult.Success) {
        resultList.Add(matchResult.Groups["val"].Value);
        matchResult = matchResult.NextMatch();
    } 
} catch (ArgumentException ex) {
    // Syntax error in the regular expression
}

Disclaimer: The regex has been tested in RegexBuddy, (which generated this snippet), and it correctly matches the OP test data, but the C# code logic is untested. (I don't have access to C# tools.)


Regex is not the suitable tool for this. Use a CSV parser. Either the builtin one or a 3rd party one.


Give the TextFieldParser class a look. It's in the Microsoft.VisualBasic assembly and does delimited and fixed width parsing.


Give CsvHelper a try (a library I maintain). It's available via NuGet.

You can easily read a CSV file into a custom class collection. It's also very fast.

var streamReader = // Create a StreamReader to your CSV file
var csvReader = new CsvReader( streamReader );
var myObjects = csvReader.GetRecords<MyObject>();


Regex might get overly complex here. Split the line on commas, and then iterate over the resultant bits and concatenate them where "the number of double quotes in the concatenated string" is not even.

"hello,this",is,"a ""test"""

...split...

"hello | this" | is | "a ""test"""

...iterate and merge 'til you've an even number of double quotes...

"hello,this" - even number of quotes (note comma removed by split inserted between bits)

is - even number of quotes

"a ""test""" - even number of quotes

...then strip of leading and trailing quote if present and replace "" with ".


It could be done using below code:

using Microsoft.VisualBasic.FileIO;
string csv = "1,2,3,"4,3","a,"b",c",end";
TextFieldParser parser = new TextFieldParser(new StringReader(csv));
//To read from file
//TextFieldParser parser = new TextFieldParser("csvfile.csv");
parser.HasFieldsEnclosedInQuotes = true;
parser.SetDelimiters(",");
string[] fields =null;
while (!parser.EndOfData)
{
    fields = parser.ReadFields();
}
parser.Close();
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜