Trade off point between Dictionary(hashtable) and List find
Short question: How many elements can be in a list to make a linear search like List O(n) faster then Dictionary which is O(1)?
I seem to remember in college(and high school) when comparing searching techniques ( binary vs linear) there was always a point where linear was faster. that was O(n) vs O(log)
I can't draw the graph here. but there must be some over head for the constant performance. So the question is if I have 10 items does List.Find mak开发者_运维技巧e more sense then
if (Dictionary.Contains(x))
value = Directiory[x]
Or a Hashtable where value = Hashtable[x] will not fail but will require boxing
I asked myself that same question, and ran a benchmark to find out. I found the speed of a dictionary already outperforms a list in lookup at around 3-4 elements.
That seemed far to few to me based on the overhead of a dictionary, so I checked around to see if anyone else came upon the same results, and that seems to be the results others found as well. http://dotnetperls.com/dictionary-time
That doesn't mean, throw dozens of dictionaries into your code where they don't make sense - there's a memory overhead and construction time to deal with as well. But if you have a decently sized set of keyed data, take advantage of that structure.
Test both methods with your data and make a decision.
With only 10 items in your collection, it's likely the constant cost of hashing will outweigh the cost of looking at each item in your list (worst case). We can't say for sure, though. Test it and find out.
Keep in mind inserting into a Dictionary is also more expensive than inserting into a List. Depending on where and how you're using your collection, an additional insert cost may be undesirable.
So after running some test of my own it looks like Hash-Dictionary always wins. That is a Dictionary object with a HashCode, int32, as a key
10 items 500,000 itterations
Test Name First Last Not Found Average
FindInList 104.26 255.29 254.63 204.73
FindInArray 51.28 192.23 182.91 142.14
FindInHashDict 56.3 54.38 51.16 53.95
FindInDict 105.75 101.38 52.02 86.38
100 items 500,000 itterations
Test Name First Last Not Found Average
FindInList 102.83 1873.45 1820.85 1265.71
FindInArray 56.21 1313.61 1310.65 893.49
FindInHashDict 91.01 53.31 60.46 68.26
FindInDict 119.01 101.65 100.11 106.92
Here is my code that performs the find operation. My objects are hierarchical task that would be searched by a unique name. I know it is a long post but in case someone wanted to dispute the findings they can see the code.
private SearchResult FindInDict()
{
SearchResult result = new SearchResult();
result.SeachType = "FindInDict";
result.itterations = 1;
Stopwatch timer = new Stopwatch();
timer.Start();
if (dictStrBoundryTask.ContainsKey(NameOfFirst))
{
TaskBase t = dictStrBoundryTask[NameOfFirst];
}
timer.Stop();
result.firstItem = timer.Elapsed.TotalMilliseconds;
timer.Reset();
timer.Start();
if (dictStrBoundryTask.ContainsKey(NameOfLast))
{
TaskBase t = dictStrBoundryTask[NameOfLast];
}
timer.Stop();
result.lastItem = timer.Elapsed.TotalMilliseconds;
timer.Reset();
timer.Start();
if (dictStrBoundryTask.ContainsKey(NameOfNotFound))
{
TaskBase t = dictStrBoundryTask[NameOfNotFound];
}
timer.Stop();
result.notFoundItem = timer.Elapsed.TotalMilliseconds;
return result;
}
private SearchResult FindInHashDict()
{
SearchResult result = new SearchResult();
result.SeachType = "FindInHashDict";
result.itterations = 1;
Stopwatch timer = new Stopwatch();
timer.Start();
if (dictIntBoundryTask.ContainsKey(NameOfFirst.GetHashCode()))
{
TaskBase t = dictIntBoundryTask[NameOfFirst.GetHashCode()];
}
timer.Stop();
result.firstItem = timer.Elapsed.TotalMilliseconds;
timer.Reset();
timer.Start();
if (dictIntBoundryTask.ContainsKey(NameOfLast.GetHashCode()))
{
TaskBase t = dictIntBoundryTask[NameOfLast.GetHashCode()];
}
timer.Stop();
result.lastItem = timer.Elapsed.TotalMilliseconds;
timer.Reset();
timer.Start();
if (dictIntBoundryTask.ContainsKey(NameOfNotFound.GetHashCode()))
{
TaskBase t = dictIntBoundryTask[NameOfNotFound.GetHashCode()];
}
timer.Stop();
result.notFoundItem = timer.Elapsed.TotalMilliseconds;
return result;
}
private SearchResult FindInArray()
{
SearchResult result = new SearchResult();
result.SeachType = "FindInArray";
result.itterations = 1;
Stopwatch timer = new Stopwatch();
timer.Start();
foreach (TaskBase t in arrayBoundaryTask)
{
if (t.Name == NameOfFirst)
{
break;
}
}
timer.Stop();
result.firstItem = timer.Elapsed.TotalMilliseconds;
timer.Reset();
timer.Start();
foreach (TaskBase t in arrayBoundaryTask)
{
if (t.Name == NameOfLast)
{
break;
}
}
timer.Stop();
result.lastItem = timer.Elapsed.TotalMilliseconds;
timer.Reset();
timer.Start();
foreach (TaskBase t in arrayBoundaryTask)
{
if (t.Name == NameOfNotFound)
{
break;
}
}
timer.Stop();
result.notFoundItem = timer.Elapsed.TotalMilliseconds;
return result;
}
private SearchResult FindInList()
{
SearchResult result = new SearchResult();
result.SeachType = "FindInList";
result.itterations = 1;
Stopwatch timer = new Stopwatch();
timer.Start();
TaskBase t = listBoundaryTask.Find(x => x.Name == NameOfFirst);
if (t!=null)
{
}
timer.Stop();
result.firstItem = timer.Elapsed.TotalMilliseconds;
timer.Reset();
timer.Start();
t = listBoundaryTask.Find(x => x.Name == NameOfLast);
if (t != null)
{
}
timer.Stop();
result.lastItem = timer.Elapsed.TotalMilliseconds;
timer.Reset();
timer.Start();
t = listBoundaryTask.Find(x => x.Name == NameOfNotFound);
if (t != null)
{
}
timer.Stop();
result.notFoundItem = timer.Elapsed.TotalMilliseconds;
return result;
}
I don't think that there is a hard and fast rule you could apply to come up with a bulletproof metric. You could always imagine a degenerate Hashtable case where you end up with O(n) searches due to collisions, however with realistic data the odds of this are minuscule.
In general case, if fast searching is important for your application then a Hashtable-type construct is always your best choice.
(Hashtables have their own set of drawbacks - they're not a magic bullet. Wikipedia has a good explanation of cases when you might not want to choose a Hashtable: link.)
Dictionary search is always faster then linear search.
精彩评论