开发者

What is a good .NET data structure for finding unique items?

I have a large collection of custom objects that I have retrieved from a query in my system. Let's say these objects all have 5 different properties - FirstName, LastName, Gender, ZipCode and Birthday. For each of the different properties I would like to be able to get a list of all of the unique values and their counts and sort them in descending order. It is sort of a faceted navi开发者_如何学Gogation system. So if I have like 5000 results in my initial query then I would like to be able to display the top 10 FirstNames from most popular to least popular with the count next to it. And then the same with the other properties.

Currently I have a routine that goes through each item one at a time and examines the different properties and keeps a bunch of different hashtables with the information. It works but it is super slow. I think that going through each item one at a time is not very efficient. Is there some other type of C# structure I could use that would make getting this type of information easier? I know that SQL Server does a great job of this type of thing - but I don't think that is really a possibility here. I'm getting my list of custom objects from the API of a different system. So I would have to then take that list of objects and put them in to a temp table somehow and that sort of defeats the purpose I think. Plus SQL Server temp tables are connection specific I think and my app would re-use connections.

EDIT: What I am trying to avoid is having to iterate through the list and process each individual item. I was wondering if there was some data structure that would allow me to sort of query the whole list at once (like a database) and get the information. The problem is that our front end web server is just getting hammered because we have a lot of traffic on the server and people are hitting these faceted nav pages and I am looking for a more efficient way of doing it.

Any ideas?

Thanks, Corey


Unfortunately, I'm pretty sure the answer to your question is, "No." If the only way you have of getting your data is an unindexed List<MyObject>, then something is going to have to go through those items one-by-one and analyze them for Top-N or create indices. Even if you pass that on to another tool (a temp database or third party data structure), you're just putting the processing somewhere else and your CPU will crank just as much. The solution you outline in your original question seems like the most reasonable thing to do.

A few suggestions:

  • Are these Top-N lists the same for all users, or could they be broken into a distinct number of use cases? You could get them once and store them in web cache. Maybe set a background process to update them every M minutes to keep them somewhat up-to-date.
  • Is it just a UI perception problem? Could you calculate and display the most important results first and then calculate the others in the background and deliver to the page asynchronously?
  • Beg the API provider for a more robust way to get results?? :)
  • Throw more hardware at it?? :)

Sorry for the non-answer, but I don't think there's a magic bullet here.


i4o - Indexed LINQ http://www.codeplex.com/i4o allows to put indexes on objects.

It basically provides RDBMS-style indexing for clr.

Are you using a DBMS for your initial query? In this case the answer would be: Why not just design specific SQL queries?


Keeping one dictionary per property should work fine. How slow is it? Can you show us the code you're using? 5000 items should be processed in the blink of an eye.

Are you using .NET 3.5? If so, LINQ could help you with a lot of this - in particular, using ToLookup with each property in turn would work pretty well.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜