In VB.NET, what data structure to use so that, given two characters, return an integer for optimum performance?
I am finishing up a program that does a large number of calculations and am trying to optimize the innermost loop.
The calculation I am currently looking at iterates over a large number of pairs of words and makes a table of the counts of corresponding pairs of characters. For example, one pair of words might be:
voice
louse
and the character pairs would then be (v,l), (o,o), (i,u), (c,s), and (e,e), and these pairs would all then have a count of 1. If the combination (v,l) is ever encountered again in another word, it would increment that count to two.
What data structure should I use for highest performance? Given the two characters, I need to retrieve the count for that pair. Currently I am using a nested hash table whose declaration looks like:
Dim data As New Dictionary(of String, Dictionary(of String, Integer))
Using this data structure, the program must hash two strings for every integer it retrieves. For every ch开发者_高级运维aracter pair, it must first check to see if the pair is in the hash table, and if not add it, requiring two more hashes. I have also considered a one level hash table with the key being the two characters concatenated together, so key = "vl" and value = 1, but I have read that string concatenation is relatively slow in VB.
So then, my questions are:
How speedy are the Dictionaries in VB? Would four hashes be quicker than one hash and a string concatenation (two level vs one level hash table)?
Can you think of a better structure to store this kind of data that allows fast additions and retrieval?
One option is to use a Dictionary(Of Integer, Integer)
. You can convert from any .NET Unicode character to an unsigned 16 bit integer, as they're UTF-16 code units.
You can then combine two unsigned 16 bit integers into a 32 bit integer very easily. Alternatively, you can just convert each code unit into a 32 bit unsigned integer to start with, and shift and combine in the same way :)
In C# I'd just use:
int combination = (((int) char1) << 16) | ((int) char2);
EDIT: According to jeroenh's comment, the VB equivalent is:
Dim combination As Integer = (AscW(char1) << 16) Or AscW(char2)
精彩评论