CSV + FileHelpers + Double Quotes = Nightmare
I can开发者_开发问答't seem to handle a CSV I got. It's a file generated by a bank, which looks like this:
"000,""PLN"",""XYZ"",""2011-08-31"",""2011-08-31"",""0,00"""
1,""E"",""2011-08-30"",""2011-08-31"",""2011-08-31"",""399,00"",""0000103817846977"",""UZNANIE OTRZYMANE ELIXIR"",""23103015080000000550217023"",""XXX"",""POLISA UBEZPIECZENIA NR XXX "",""000""
3,""E"",""2011-08-31"",""2011-08-31"",""2011-08-31"",""1433,00"",""0000154450232753"",""UZNANIE OTRZYMANE ELIXIR"",""000"",""XXX"",""POLISA UBEZPIECZENIA XXX "",""000""
(I changed all sensitive information).
I've been trying to parse it since morning but no biggie. I used the LINQ to CSV example found somwhere on the net, the CodeProject one (both of them threw an error which said that the CSV is corrupted) and I ended with FileHelpers which SEEMS to work BUT:
- It splits the "
399,00
" and similar values into two fields. - When I use the [(FieldQuoted()] attribute it all goes to hell, since all the fields are quoted in DOUBLE quotation marks. I suspect that is the reason why the other parsers wouldn't work.
Any ideas how to handle it?
If the problem seems to be the double quote, you could preprocess each line by substituting the double double quotes by single double quotes:
line = line.Replace( "\"\"", "\"" );
Once the whole file has been processed, you can let it handled by any other CSV processor. It will be probably easier to write your own, anyway.
I have been using Lumen, CommonLibrary, FileHelpers etc. and I ended up with TextFieldParser
class (from Visual Basic namespace, but can be used in C# without any problem). I recommend you try that. The only downside is that it's relatively slow. But it seems to cope with edge cases quite well.
I even invented a trick getting it to work with obviously invalid CSV files (""" etc.; OpenOffice Calc couldn't handle them properly) - when I'd encounter such a line and got a MalformedLineException
, I'd still parse it within the catch
block with the HasFieldsEnclosedInQuotes
property set to false
, for a change.
It would split the line properly, just leaving all the values in double apostrophes. All I had to do then was to remove these double quotes "manually".
精彩评论