How to roll my own index in c#?
I need a faster way to create an index file. The application generates pairs of items to be index开发者_如何学JAVAed. I currently add each pair as it is generated to a sorted dictionary and then write it out to a disk file. This works well until the number of items added exceeds one million, at which time it slows to the point that is unacceptable. There can be as many as three million data items to be indexed. I prefer to avoid a database because I do not want to significantly increase the size of the deployment package, which is now less than one-half of one megabyte. I tried Access but it is even slower than the sorted dictionary -if it had an efficient bulk load utility then that might work, but I do not find such a tool for Access.
Is there a better way to roll my own index?
Is the SortedDictionary really the bottleneck? As compared to the I/O ?
You really should profile this first to prevent optimizing the wrong parts.
But as a tip, when you have 1M or more items it is a good idea to pre-allocate your Dictionary. Give it an initial capacity of 2M or so.
//var index = new SortedDictionary(2 * 1024 * 1024); // not supported, it's a tree
var index = new SortedList(2 * 1024 * 1024);
If your Dictionary is the problem I would expect it to be from constant re-allocation sooner than from the actual index-searches.
Just a thought, but could you use an in-memory SQL solution, like SQL Lite ? It's just a small DLL, but will help your priorities, do your logic in C# and do your sorting in SQL.
Have a look here:
http://web.archive.org/web/20100208133236/http://www.mikeduncan.com/sqlite-on-dotnet-in-3-mins/
The Download for SQL Lite itself is only 253k, and the .net bindings are about 75k.
Is SQLite too big to deploy with your software? I'm going to agree with Henk that the constant reallocations within the SortedDictionary are probably the bottleneck. If that solution proves false, try using SQLite to see if that increases performance, and then you can decide on where to go from there.
精彩评论