How do I correctly handle CR when reading text files with OleDB
I have text files that are Tab delimited. I created a Schema.ini like so:
[MY_FILE.TAB]
Format=TabDelimited
ColNameHeader=False
Col1=id Short
Col2=data Text
This is the code I use to read it (C#):
using (var connection = new OleDbConnection(@"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=D:\FolderToData\;Extended Properties='text;FMT=delimited'"))
{
using (var command = new OleDbCommand("SELECT * FROM MY_FILE.TAB", connection))
{
var table = new DataTable();
using (var adapter = new OleDbDataAdapter(command)
{
adapter.Fill(table);
}
}
}
Everything works fine, except for one thing. The data in the text file contains Carriage Returns [CR]. Records itself are separated by Carriage Return Line Feeds [CR][LF]. Unfortunately, OleDB / MicrosoftJet (or whatever parses these files) treats both ([CR], [CRLF]) the same.
Example of MY_FILE.TAB (there should be a Tab between numbers and text):
1 One[CR][LF]
2 Two[CR][LF]
3 Th[CR]
ree[CR][LF]
4 Four[CR][LF]
Gives me 5 (malformed) Rows in the DataTable instead of 4.
What I need is:
1 "One"
2 "Two"
3 "Th\nree"
4 "Four2
But I get:
1 "One"
2 "Two"
3 "Th"
null null
4 "Four"
"ree" can't be converted to Int32 so first colum in fourth row is nu开发者_开发问答ll.
How can I configure OleDB to treat [CR] different than [CR][LF]? Or any other Ideas?
I don't believe you can reconfigure OLEDB to do this directly.
An alternative approach would be to use a TextReader and TextWriter to process the file into a temporary file, scanning for and replacing CR alone into some special escape sequence. Then use OLEDB to read this replacement temporary file; finally, replace the special escape sequence back to a CR.
Wouldn't it be easy to read the file contents in a string, split it by Environment.NewLine
or \r\n
, which will get you an array for each line, which you can further split by tab
?
精彩评论