开发者

Parsing a large CSV file, dealing with commas and quotes

I need to load in a large CSV file (>1MB) and parse it. Generally this is quite easy to do by splitting first on linebreaks and then commas. The problem is 开发者_开发知识库though that some entries contain Strings that include their own commas. When this spreadsheet is converted to CSV, the lines containing commas are wrapped in quotes.

I've written a parser that first escapes all the commas in these strings, then splits it on linebreaks and then commas, and then unescapes the values again.

This is quite a slow process for such a long string, as I need to iterate through the whole string. Does anyone know a faster or more optimised method of dealing with this?


Have you had a look at csvlib yet? It is a parser library for ActionScript 3. It claims to be designed to properly handle quoted strings.

Hopefully, you are already enclosing your strings in quotes, especially the ones containing the commas. CSV parsers cannot distinguish a comma that is part of a string from a comma that separates two strings, unless the strings have quotes around them.

    
Good
    "This string, has a comma", "This string doesn't"

Bad
    This string, has a comma, this string doesn't


Processing the file in a single pass will reduce the time. This can be achieved by using a simple state machine to handle the complexity of commas embedded in the values. Regards


  • Add a reference to the Microsoft.VisualBasic (yes, it says VisualBasic but it works in C# just as well - remember that at the end it is all just IL)
  • Use the Microsoft.VisualBasic.FileIO.TextFieldParser class to parse the CSV file

Here is the sample code:

    Dim parser As TextFieldParser = New TextFieldParser("C:\mar0112.csv")
    parser.TextFieldType = FieldType.Delimited
    parser.SetDelimiters(",")

    While Not parser.EndOfData
        'Processing row    
        Dim fields() As String = parser.ReadFields
        For Each field As String In fields
            'TODO: Process field     

        Next

    End While
    parser.Close()
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜