Retrieve not empty rows in first column of Excel-sheet via OledbCommand
my requirement is to read all rows of an excel-sheet in first column that are not empty, are numeric and have a length between 15 and 20.
For example:
358218033354974
359473035499561
358218036156129
354022038366247
358218032490035
359473030516492
353210040325399
This column might have a header that is not numeric and there might be empty rows. This would cause an exception when importing them via SqlBulkCopy. So i want to prefilter the correct rows via OleDbCommand.
What i have tried so far is following(C# is also appreciated):
Using connection As New OleDbConnection(sExcelConnectionString)
connection.Open()
Dim schemaTable As DataTable = _
connection.GetOle开发者_开发知识库DbSchemaTable(OleDbSchemaGuid.Tables, _
New Object() {Nothing, Nothing, Nothing, "TABLE"})
For Each row As DataRow In schemaTable.Rows
'loop every Worksheet
Dim OleDbCmd As OleDbCommand = New OleDbCommand("SELECT * FROM [" & row("TABLE_NAME").ToString & "]", connection)
Dim dr As OleDbDataReader = OleDbCmd.ExecuteReader()
Dim bulkCopy As SqlBulkCopy = New SqlBulkCopy(sSqlConnectionString)
bulkCopy.DestinationTableName = destTable
bulkCopy.WriteToServer(dr)
Next
End Using
But this throws an exception if there are empty rows or the format of the value is incorrect. So my question is:
Q: How to restrict the rows of OleDbCommand to:
- get only first column of every worksheet
- all values that are numeric
- skip empty values
- ideally only the values with a length between 15 and 20
Edit: if somebody could only show me a way how to skip the empty rows, i would be happy. Or do i have to to select the whole datatable? I hoped it would be possible to do that only with one query because of performance reasons.
Thank you in advance.
This is my solution, maybe it'll help somebody in some way:
Using connection As New OleDbConnection(sExcelConnectionString)
connection.Open()
Dim schemaTable As DataTable = _
connection.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, _
New Object() {Nothing, Nothing, Nothing, "TABLE"})
Dim schemaColTable As DataTable = _
connection.GetOleDbSchemaTable(OleDbSchemaGuid.Columns, _
New Object() {Nothing, Nothing, Nothing, Nothing})
Dim firstWorkSheet As String = schemaTable.Rows(0)("TABLE_NAME").ToString
Dim firstColumnName As String = schemaColTable.Rows(0)("COLUMN_NAME").ToString
Dim OleDbSQL As String = String.Format( _
"SELECT TRIM([{1}]) AS IMEI " & _
"FROM [{0}] " & _
"WHERE LEN(TRIM([{1}])) BETWEEN 10 AND 15 " & _
"ORDER BY [{1}]", firstWorkSheet, firstColumnName)
Dim OleDbCmd As OleDbCommand = New OleDbCommand(OleDbSQL, connection)
Using bulkCopy As New SqlBulkCopy(sSqlConnectionString)
bulkCopy.DestinationTableName = destTable
bulkCopy.WriteToServer(OleDbCmd.ExecuteReader)
End Using
End Using
Convert to C#
I do not see why you should not use SQL:
"SELECT F1 from [Sheet1$] WHERE Len(F1)>14 AND Len(F1)<21 AND IsNumeric(F1)"
When HDR is set to NO in the connection string, fields names are assigned as F1, F2 etc, numbering from the first column of the selection. It is also possible to use [Sheet1$A:A] if you are sure that there is data in column A. Note that both named ranges and sheets are returned in the table schema.
精彩评论