Performance of my implementation of IComparer
I'm sorting a collection of type "Document" (usually around 100k records). Sorting usually takes around 4-5 seconds, and I'm wondering if there's a way to speed up sorting by modifying my "DocumentComparer" class which implements IComparer(Of Document). Since the Compare() method would be called hundreds of thousands of times, are there any performance improvements that could be made here that I've overlooked?
Public Class DocumentComparer
Implements IComparer(Of Document)
Private _Prop As PropertyDescriptor
Private _Properties As PropertyDescriptorCollection = TypeDescriptor.GetProperties(GetType(Document))
Private _SortDirection As ListSortDirection
Private _PropertyType As Type
Public ReadOnly Property Prop As PropertyDescriptor
Get
Return _Prop
End Get
End Property
Public ReadOnly Property SortDirection As ListSortDirection
Get
Return _SortDirection
End Get
End Property
Public Sub New(ByVal prop As PropertyDescriptor, ByVal sortDirection As ListSortDirection)
_Prop = prop
_SortDirection = sortDirection
End Sub
Public Function Compare(ByVal x As Document, ByVal y As Document) As Integer Implements System.Collections.Generic.IComparer(Of Document).Compare
Dim xPropertyValue As Object
Dim yPropertyValue As Object
Dim compareValue As Integer = 0
Try
xPropertyValue = _Prop.GetValue(x)
yPropertyValue = _Prop.GetValue(y)
If _Prop.Name = "Revision" Then
' When sorting by the revision, actually sort by the RevisionSort property from the Document class.
_Prop = _Properties.Item("RevisionSort")
xPropertyValue = _Prop.GetValue(x)
yPropertyValue = _Prop.GetValue(y)
compareValue = xPropertyValue.ToString().CompareTo(yPropertyValue.ToString())
ElseIf _Prop.Name = "ReleaseDate" Then
If xPropertyValue Is Nothing And yPropertyValue Is Nothing Then
compareValue = 0
ElseIf xPropertyValue Is Nothing Then
compareValue = -1
ElseIf yPropertyValue Is Nothing Then
compareValue = 1
Else
compareValue = DirectCast(xPropertyValue, DateTime).CompareTo(DirectCast(yPropertyValue, DateTime))
End If
ElseIf xPropertyValue Is Nothing And yPropertyValue Is Nothing Then
Return 0
ElseIf xPropertyValue Is Nothing And yPropertyValue IsNot Nothing Then
compareValue = -1
ElseIf xPropertyValue IsNot Nothing And yPropertyValue Is Nothing Then
compareValue = 1
ElseIf _Prop.PropertyType Is GetType(String) Then
' If we are sorting string values...that's easy. Just call the String's CompareTo() method.
compareValue = xPropertyValue.ToString().CompareTo(yPropertyValue.ToString())
ElseIf _Prop.PropertyType Is GetType(DateTime) Then
' If we are sorting by a DateTime column then just call the DateTime's CompareTo() method.
compareValue = DirectCast(xPropertyValue, DateTime).CompareTo(DirectCast(yPropertyValue, DateTime))
ElseIf _Prop.PropertyType Is GetType(Integer) Then
compareValue = DirectCast(xPropertyValue, Integer).CompareTo(DirectCast(yPropertyValue, Integer))
Else
' Future expansion of comparison of different types
Throw New NotImplementedException("Datatype of column cannot be compared.")
E开发者_Go百科nd If
Catch ex As Exception
DocDbModelException.AddObjectToExceptionData(Me, ex)
Throw New DocDbModelException(String.Format("Failed comparing objects: {0}", ex.Message), ex, True)
End Try
If _SortDirection = ListSortDirection.Ascending Then
Return compareValue
Else
Return -compareValue
End If
End Function
End Class
The big problems in this code that jump out at me are:
- Using reflection
- Big if/else lists
- Performing checks in the
Compare
method (which gets called hundreds of thousands of times) which could have been determined when the object was initialized, or at least lazily once theCompare
method gets called the first time
So, for example, how about something like this?
Public Class DocumentComparer
Implements IComparer(Of Document)
Private _Prop As PropertyDescriptor
Private _SortDirection As ListSortDirection
Private _Comparer As Func(Of Document, Document, Integer)
Public ReadOnly Property Prop() As PropertyDescriptor
Get
Return _Prop
End Get
End Property
Public ReadOnly Property SortDirection() As ListSortDirection
Get
Return _SortDirection
End Get
End Property
Shared _ComparersByName As ConcurrentDictionary(Of String, Func(Of Document, Document, Integer))
Shared Sub New()
Dim dict = New Dictionary(Of String, Func(Of Document, Document, Integer))() From { _
{"Revision", Function(x, y) String.Compare(x.RevisionSort, y.RevisionSort)}, _
{"ReleaseDate", Function(x, y) Nullable.Compare(Of DateTime)(x.ReleaseDate, y.ReleaseDate)} _
' add remaining sort functions here
}
_ComparersByName = New ConcurrentDictionary(Of String, Func(Of Document, Document, Integer))(dict)
End Sub
Public Sub New(prop As PropertyDescriptor, sortDirection As ListSortDirection)
_SortDirection = sortDirection
_Comparer = _ComparersByName(prop.Name)
End Sub
Public Function Compare(x As Document, y As Document) As Integer
Try
Dim compareResult As Integer = _Comparer(x, y)
Return If(_SortDirection = ListSortDirection.Ascending, compareResult, -compareResult)
Catch ex As Exception
Throw New Exception(String.Format("Failed comparing objects: {0}", ex.Message), ex)
End Try
End Function
End Class
Update
It would be more complicated if you don't want to handle each property explicitly, but there are certainly options. You could create another Shared/static dictionary to use as a fallback, which is keyed on the property Type rather than its name. Since you wouldn't know the property at compile time, the value of each key couldn't just be the simple comparison function like you see in the example above. Instead, you'd have to make a choice:
- The value could be function that could build an expression tree when given the property, which you could then compile into a comparison function (see http://msdn.microsoft.com/en-us/library/bb397951.aspx). Compiling the function in the first place is an expensive process, so you might want to have another Shared/static dictionary that acts as a cache for the compiled function for each property. Pro: Once you've compiled the function, it will run very quickly. Con: Generating expression trees is tedious, and usually involves a lot of code.
- The value could be a function that uses reflection, similar to the code you use in your original question. Pro: Code is simpler, and you pretty much already have the code you need. Con: While this does eliminate the big if/else list, it probably won't give you nearly as much of a performance difference because you're still using reflection.
One final note:
- Use
string.Compare
andNullable.Compare<T>
where appropriate to avoid complicating your code with null checks. When comparing non-nullable types (int
,DateTime
, etc.), don't bother with null checks, since they aren't necessary.
Don't initialize _Properties
until its actually needed, that might help a bit. TypeDescriptor
uses reflection which always adds to perf.
In fact, (looking at your other post), do you have an actual need to use TypeDescriptor
or can you just use the real properties's attributes without reflection. Reflection is like trying to find the color of your shoes by taking them off and holding them to your eyes when instead you could just look down.
精彩评论