IEnumerable as DataTable performance issue
I have the following extension, which generates a DataTable
from an IEnu开发者_运维百科merable
:
public static DataTable AsDataTable<T>(this IEnumerable<T> enumerable)
{
DataTable table = new DataTable();
T first = enumerable.FirstOrDefault();
if (first == null)
return table;
PropertyInfo[] properties = first.GetType().GetProperties();
foreach (PropertyInfo pi in properties)
table.Columns.Add(pi.Name, pi.PropertyType);
foreach (T t in enumerable)
{
DataRow row = table.NewRow();
foreach (PropertyInfo pi in properties)
row[pi.Name] = t.GetType().InvokeMember(pi.Name, BindingFlags.GetProperty, null, t, null);
table.Rows.Add(row);
}
return table;
}
However, on huge amounts of data, the performance isn't very good. Is there any obvious performance fixes that I'm unable to see?
First, a couple of non-perf problems:
- The type of the first item in the enumerable might be a subclass of T that defines properties that might not be present on other items. To avoid problems that this may cause, use the T type as the source of the properties list.
- The type might have properties that either have no getter or that have an indexed getter. Your code should not attempt to read their values.
On the perf side of things, I can see potential improvements on both the reflection and the data table loading sides of things:
- Cache the property getters and invoke them directly.
- Avoid accessing the data row columns by name to set the row values.
- Place the data table in "data loading" mode while adding the rows.
With these mods, you would end up with something like the following:
public static DataTable AsDataTable<T>(this IEnumerable<T> enumerable)
{
if (enumerable == null)
{
throw new ArgumentNullException("enumerable");
}
DataTable table = new DataTable();
if (enumerable.Any())
{
IList<PropertyInfo> properties = typeof(T)
.GetProperties()
.Where(p => p.CanRead && (p.GetIndexParameters().Length == 0))
.ToList();
foreach (PropertyInfo property in properties)
{
table.Columns.Add(property.Name, property.PropertyType);
}
IList<MethodInfo> getters = properties.Select(p => p.GetGetMethod()).ToList();
table.BeginLoadData();
try
{
object[] values = new object[properties.Count];
foreach (T item in enumerable)
{
for (int i = 0; i < getters.Count; i++)
{
values[i] = getters[i].Invoke(item, BindingFlags.Default, null, null, CultureInfo.InvariantCulture);
}
table.Rows.Add(values);
}
}
finally
{
table.EndLoadData();
}
}
return table;
}
Instead of doing:
row[pi.Name] = t.GetType().InvokeMember(pi.Name, BindingFlags.GetProperty, null, t, null);
use:
row[pi.Name] = pi.GetValue(t, null);
You could always use a library like Fasterflect to emit IL instead of using true Reflection for every property on every item in the list. Not sure about any gotcha's with the DataTable
.
Alternatively, if this code is not trying to be a generic solution, you could always have whatever type is within the IEnumerable
translate itself to a DataRow
, thus avoiding reflection all together.
You may not have a choice about this, but possibly look at the architecture of the code to see if you can avoid using a DataTable
and rather return an IEnumerable<T>
yourself.
Main reason(s) for doing that would be:
You are going from an IEnumerable to a DataTable, which is effectively going from a streamed operation to a buffered operation.
Streamed: uses
yield return
so that results are only pulled off the enumeration as-and-when they are needed. It does not iterate the whole collection at once like aforeach
Buffered: pulls all of the results into memory (e.g. a populated collection, datatable or array) so all of the expense is incurred at once.
If you can use an IEnumerable return type, then you can make use of the
yield return
keyword yourself, meaning you spread the cost of all that reflection out instead of incurring it all at once.
精彩评论