Is there added overhead to looking up a column in a DataTable by name rather than by index?
In a DataTable
object, is there added overhead to lo开发者_StackOverflow社区oking up a column value by name thisRow("ColumnA")
rather than by the column index thisRow(0)
? In which scenarios might this be an issue.
I work on a team that has lots of experience writing VB6 code and I noticed that didn't do column lookups by name for DataTable
objects or data grids. Even in .NET code, we use a set of integer constants to reference column names in these types of objects. I asked our team lead why this was so, and he mentioned that in VB6, there was a lot of overhead in looking up data by column name rather than by index. Is this still true for .NET?
Example code (in VB.NET, but same applies to C#):
Public Sub TestADOData()
Dim dt As New DataTable
'Set up the columns in the DataTable '
dt.Columns.Add(New DataColumn("ID", GetType(Integer)))
dt.Columns.Add(New DataColumn("Name", GetType(String)))
dt.Columns.Add(New DataColumn("Description", GetType(String)))
'Add some data to the data table '
dt.Rows.Add(1, "Fred", "Pitcher")
dt.Rows.Add(3, "Hank", "Center Field")
'Method 1: By Column Name '
For Each r As DataRow In dt.Rows
Console.WriteLine( _
"{0,-2} {1,-10} {2,-30}", r("ID"), r("Name"), r("Description"))
Next
Console.WriteLine()
'Method 2: By Column Name '
For Each r As DataRow In dt.Rows
Console.WriteLine("{0,-2} {1,-10} {2,-30}", r(0), r(1), r(2))
Next
End Sub
Is there an case where method 2 provides a performance advantage over method 1?
Yes, there should be a slight overhead connected to looking up columns by name instead of by index. I wouldn't worry about it, unless you keep looking up that same column in a loop, like in your code example. Because then the slight overhead might accumulate to a measurable overhead, depending on the table's number of rows.
The fastest way to access a particular column's value of some row is to lookup using the DataColumn
object itself. For example:
Dim dt As DataTable = ...
Dim idColumn As DataColumn = dt.Columns("ID")
Dim nameColumn As DataColumn = dt.Columns("Name")
Dim descriptionColumn As DataColumn = dt.Columns("Description")
For Each r As DataRow In dt.Rows
' NB: lookup through a DataColumn object, not through a name, nor an index: '
Dim id = r(idColumn)
Dim name = r(nameColumn)
Dim description = r(descriptionColumn)
...
Next
One last piece of advice: I would strongly advise you against using numerical indices! It makes your code more fragile, and also more difficult to understand and maintain: As soon as the logical order of a column changes, you need to adapt your code accordingly, possibly in several places (and you might easily oversee one of them, leading to bugs). If you instead use column names or DataColumn
objects themselves for the lookup, you can change your columns' order without having to change the remaining code.
The facts are
- Getting the column by index is a direct index into an ArrayList
- If the name lookup is case sensitive a hashtable lookup is performed
- If the name lookup is case insensitive then the list is scanned for the name and a more costly locale sensitive string comparison is performed.
So definately the retrieval by index will be the most performant, but does it matter?
Doing some basic (read naive) tests on my machine I found that 1,000,000 accesses to a column using the mentioned indexing mechanism take the following times
- Direct index - 13.3ms
- Case insensitive lookup - 109.11ms
- Case sensitive lookup - 109.24ms
So depending on your scenario you can draw your conclusions.
Yes, there is overhead with looking up a column by name rather than absolute index (as it simply locates the column, then accesses it that way).
That being said, that's fairly premature optimization. The DataTable
will first try to locate the column in a case-sensitive manner, which is very fast. If it can't locate the column that way, it looks in a case-insensitive manner, which is only slightly slower.
The absolute fastest way to access the data is via the DataColumn
object itself, as that's what both the index-based and name-based accessors use.
The best approach would be to find the index by column name and use the index for locating column:
Dim table As DataTable = ...
Dim foo As int = table.Columns("Foo")
For Each row As DataRow In table.Rows
Dim data = row(foo)
Next
You are looking by index and you can also guess from name, which column you are reading. Another advantage is that if you change order of fields in your "select" query, you will still get the correct value. On the other hand, if you hardcode indexes, your code will break.
Depends on the place you use it. On desktop application startup these kind of small inefficiencies may accumulate to long delays. On mouse and keyboard events most likely not. It will most likely be more efficient to spend time on function profiling (printing out execution times to dbgview), than optimizing this kind of low-level stuff.
精彩评论