开发者

Using ASCII character sum for binary-searching a string?

I'm doing a mini project - student database using linked list, part of my first semester. The specification is that, user should be able to search a record using name initials, which is a char[4] in the structure.

Now there are two ways to search the initials, one by linear search which is indeed inefficient (I don't care actually about that, because this is not going to be some firm's basic stuff, etc.) or by binary search.

Binary search requires sorted arrays, so I was thinking if searching using the ASCII sum of the string would make any sense?

For example, record 1 has initial = "AB" and record 2 has "CD". ASCII sums of both are 65+66 = 131 & 67+68 = 135 and the list is sorted using initials (using strcmp).

So when the user inputs "AB", I'll just look for the number 131 and if present, show 开发者_JS百科the record?

This might be a very bad idea, please don't flame on me and explain why it is a bad one.


Seems like a good start to me. How will you distinguish between "TON" and "NOT" Will they sum to the same value (a 'collision')? Are you suggesting a two tiered approach? First with the ascii-sum search and second with some method to sort out the collisions? Seem like there's some good info on hashing here: http://burtleburtle.net/bob/hash/index.html


If i have understood correctly, then it would be a very faulty way of searching for initials. The first problem i see is:

AD = 65+68 = 133
BC = 66+67 = 133

Turns out that they really aren't distinguishable. But what is wrong with comparing two letters, or even, perhaps just concatenating the ASCII value?

AD = 65.68 = 6568
BC = 66.67 = 6667

Haven't slept a lot, perhaps what i write is all off.


There will be many collisions. Go for Extendible Hashing:

Wikipedia

Algorithm explained


If you're going to construct a sorted array anyway, there's no point in computing this (lossy, biased) hash value and searching for that in the sorted list - it will be just as fast to do a binary search on the list for the initials directly.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜