开发者

C#: What is the best collection class to store very similar string items for efficient serialization to a file

I would like to store a list of entityIDs of outlook emails to a file. The entityIDs are strings like:

"000000005F776F08B736B442BCF7B6A7060B509A640开发者_如何学JAVA02000" "000000005F776F08B736B442BCF7B6A7060B509A84002000" "000000005F776F08B736B442BCF7B6A7060B509AA4002000"

as you can notice, the strings are very similar. I would like to save these strings in a collection class that would be stored as efficiently as possible when I serialize it to a file. Do you know of any collection class that could be used for this?

Thank you in advance for any information... Gregor


No pre-existing collection class from the framework will suit your needs, because these are generic: by definition, they know nothing of the type they are storing (e.g. string) so they cannot do anything with it.

If efficient serialization is your only concern, I suggest that you simply compress the serialized file. Data like this are a feast for compression algorithms. .NET offers gzip and deflate algorithms in System.IO.Compression; better algorithms (if you need them) can easily be found through Google.

If in-memory efficiency is also an issue, you could store your strings in a trie or a radix tree.


You may want to take a look at the Radix Trie data-structure, as this would be able to efficiently store your keys.

As far as serialising to a file, you could, perhaps, walk the trie and write down each node. (In the following example I have used indentation to signify the level in the tree, but you could come up with something a bit more efficient, such as using control characters to signify a descent or ascent.)

00000000
  5F776F08B736B442BCF7B6A7060B509A
    64002000
    84002000
    A4002000
  6F776F08B736B442BCF7B6A7060B509A
    32100000

The example above is the set of:

000000005F776F08B736B442BCF7B6A7060B509A64002000
000000005F776F08B736B442BCF7B6A7060B509A84002000
000000005F776F08B736B442BCF7B6A7060B509AA4002000
000000006F776F08B736B442BCF7B6A7060B509A32100000


Why is efficient an issue? Do you want to use as less HD space as possible (HD space is cheap). In C# there 2 most used serializers: Binary or XML

If you want the user to let the file be adjustable with notepad for example --> use xml. If not use binary

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜