C#: What is the best collection class to store very similar string items for efficient serialization to a file
I would like to store a list of entityIDs of outlook emails to a file. The entityIDs are strings like:
"000000005F776F08B736B442BCF7B6A7060B509A640开发者_如何学JAVA02000" "000000005F776F08B736B442BCF7B6A7060B509A84002000" "000000005F776F08B736B442BCF7B6A7060B509AA4002000"
as you can notice, the strings are very similar. I would like to save these strings in a collection class that would be stored as efficiently as possible when I serialize it to a file. Do you know of any collection class that could be used for this?
Thank you in advance for any information... Gregor
No pre-existing collection class from the framework will suit your needs, because these are generic: by definition, they know nothing of the type they are storing (e.g. string
) so they cannot do anything with it.
If efficient serialization is your only concern, I suggest that you simply compress the serialized file. Data like this are a feast for compression algorithms. .NET offers gzip and deflate algorithms in System.IO.Compression
; better algorithms (if you need them) can easily be found through Google.
If in-memory efficiency is also an issue, you could store your strings in a trie or a radix tree.
You may want to take a look at the Radix Trie data-structure, as this would be able to efficiently store your keys.
As far as serialising to a file, you could, perhaps, walk the trie and write down each node. (In the following example I have used indentation to signify the level in the tree, but you could come up with something a bit more efficient, such as using control characters to signify a descent or ascent.)
00000000
5F776F08B736B442BCF7B6A7060B509A
64002000
84002000
A4002000
6F776F08B736B442BCF7B6A7060B509A
32100000
The example above is the set of:
000000005F776F08B736B442BCF7B6A7060B509A64002000
000000005F776F08B736B442BCF7B6A7060B509A84002000
000000005F776F08B736B442BCF7B6A7060B509AA4002000
000000006F776F08B736B442BCF7B6A7060B509A32100000
Why is efficient an issue? Do you want to use as less HD space as possible (HD space is cheap). In C# there 2 most used serializers: Binary or XML
If you want the user to let the file be adjustable with notepad for example --> use xml. If not use binary
精彩评论