Adding a Way To preserve A Comma In A CSV To DataTable Function
I have a function that converts a .csv file to a datatable. One of the column开发者_运维百科s I am converting is is a field of names that have a comma in them i.e. "Doe, John" when converting the function treats this as 2 seperate fields because of the comma. I need the datatable to hold this as one field Doe, John in the datatable.
Function CSV2DataTable(ByVal filename As String, ByVal sepChar As String) As DataTable
Dim reader As System.IO.StreamReader
Dim table As New DataTable
Dim colAdded As Boolean = False
Try
''# open a reader for the input file, and read line by line
reader = New System.IO.StreamReader(filename)
Do While reader.Peek() >= 0
''# read a line and split it into tokens, divided by the specified
''# separators
Dim tokens As String() = System.Text.RegularExpressions.Regex.Split _
(reader.ReadLine(), sepChar)
''# add the columns if this is the first line
If Not colAdded Then
For Each token As String In tokens
table.Columns.Add(token)
Next
colAdded = True
Else
''# create a new empty row
Dim row As DataRow = table.NewRow()
''# fill the new row with the token extracted from the current
''# line
For i As Integer = 0 To table.Columns.Count - 1
row(i) = tokens(i)
Next
''# add the row to the DataTable
table.Rows.Add(row)
End If
Loop
Return table
Finally
If Not reader Is Nothing Then reader.Close()
End Try
End Function
Don't use a .Split()
function to read your csv data. Not only does it cause the kind of error you just ran into but it's slower as well. You need a state machine -based parser. That will be faster and make it easier to correctly handle quote-enclosed text.
I have an example here:
Reading CSV files in C#
and there's also a highly-respected CSV reader on codeplex you can use:
http://www.codeproject.com/KB/database/CsvReader.aspx
You'd use my code like this:
Function DataTableFromCSV(ByVal filename As String) As DataTable
Dim table As New DataTable
Dim colAdded As Boolean = False
For Each record As IList(Of String) In CSV.FromFile(filename)
''# Add column headers on first iteration
If Not colAdded Then
For Each token As String In record
table.Columns.Add(token)
Next token
colAdded = True
Else
''# add the row to the table
Dim row As DataRow = table.NewRow()
For i As Integer = 0 To table.Columns.Count - 1
row(i) = record(i)
Next
table.Rows.Add(row)
End If
Next record
Return table
End Function
If you're using .net 3.5 or later, I'd write it a little differently to pull the column creation out of the for each loop (using type inference and .Take(1)
), but I wanted to be sure this would work with .Net 2.0 as well.
Instead of rolling out your own solution have you considered using
http://www.filehelpers.net/
It should address your issue.
I can't help you with the VB.NET side of things, but RFC 4180 is your friend. Specifically, section 2:
5. Each field may or may not be enclosed in double quotes (however some programs, such as Microsoft Excel, do not use double quotes at all). If fields are not enclosed with double quotes, then double quotes may not appear inside the fields. For example:
"aaa","bbb","ccc" CRLF zzz,yyy,xxx
6. Fields containing line breaks (CRLF), double quotes, and commas should be enclosed in double-quotes. For example:
"aaa","b CRLF bb","ccc" CRLF zzz,yyy,xxx
7. If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote. For example:
"aaa","b""bb","ccc"
Try splitting with the " and then skipping every second element starting from the first. example:
"Test,Name","123 Street,","NY","12345"
Dim tokens As String() = = System.Text.RegularExpressions.Regex.Split _
(reader.ReadLine(), """") 'Or send in " as the sepchar
You would get
{Length=9}
(0): ""
(1): "Test,Name"
(2): ","
(3): "123 Street,"
(4): ","
(5): "NY"
(6): ","
(7): "12345"
(8): ""
So you would take the Odd numbered elements only to retrieve the data. The only caveat is when there is also a " in the data file.
I still think you should reconsider not using an external library.
Here is a article that addresses it. http://www.secretgeek.net/csv_trouble.asp
Have you looked into using the TextFieldParser class that's built into the .Net framework?
It has a property called HasFieldsEnclosedInQuotes that should handle your situation.
You can set the delimiters, and then call the ReadLine and ReadFields methods to get the field data, and it should account for those fields enclosed in quotation marks.
Like others have said, don't roll your own.
Give CSVHelper a try (a library I maintain). It's separated into a parser and reader, so you can just use the parser if you want. The parsing code is pretty straight forward and RFC 4180 compliant, if you want to look at it's source.
精彩评论