开发者

How to get a group of names all with the same first letter from an alphabetically sorted list?

I was wondering what is the best way to get a group of names given the first letter. The present appl开发者_JAVA百科ication I am working in is in javascript, but I had a similar problem in another language sometimes ago. One idea I have thought of would be to do a binary search for the end of the names from a particular letter and then do another binary search for the beginning. Another idea was to take the ratio of of the distance of the given letter from the beginning and applying that ratio to find where to start the search. For example if the letter was 'e' then I would start start a quarter of the way through the list, and do some kind of search to see how close I am to the letter I need. The program will be working with several hundred names so I really didn't want to just do a for loop and search the whole thing. Also, I am interested what kind of algorithms for this are out there?


Both your approaches have their advantages and disadvantages. Binary search gives exactly O(log(N)) complexity, and your second method will give approximately O(log(N)) with some advantage for uniform distribution of names and possibly disadvantage for another type of distribution. What is better is up to your needs.

One big improvement I can propose is to index character positions while creating names list. Make simple hash map with first letters as keys and start positions as values. It will take O(N), but only once, and then you will get exact position for each letter in a constant time. For JavaScript you can do it, for example, while loading data to the page, when you walk trough the list anyway.


Guys I think we could use an approach similar to count sort.We could create an array of size 26 .This array would not be an normal array but would be an array of pointers to linked list which has the following structure.

Struct node { char *ptr ; struct node *next; };

struct node * names[26]; //Our array.

Now we would scan the list in O(n) time and corresponding to the first character we could subtract 65 (if ASCII value of letter is in the range 65 - 90).Guys i am subtracting 65 so as to fix the letter in 26 sized array. At each location we could create a linked list and can store the corresponding words in that location.

Now suppose if we want to find all letters that begin with D we could directly do to array location 3(No need to apply hash function again) and then traverse linked list created till null is reached.

And what i think space complexity required in hashing would be same as that of above but hashing would also involve computing hash function every time when we want to insert or search for words beginning with same letter.


If the plan is to do something with the names (as opposed to just find out how many there are), then it will be necessary to scan the names that fit the criteria of matching the first letter. If so, then it seems that a binary search for the first name in the entire set is the fastest method. The "do something" part would involve scanning the names starting from the location found by the binary search. When a name is read that no longer starts with the given letter, you are done.


If you have an unsorted set of filenames then I would propose following algorithm:

1) Create two variables: 1) currently found first letter (I will call it currentLetter) 2) list of filenames which start with this letter (currentFilenames)
2) firstLetter = null
currentFilenames = [] - empty list or array
3) Iterate over filenames. If current filenames starts with currentLetter then add this filename to the currentFilenames. If it starts with letter which goes before currentLetter then assign currentLetter to the first letter of new filename and create a new currentFilenames list which consists only of one current filename.

With such an algorithm you will have at the end a letter which goes first in the alphabet and list of files starting from that letter.

Sample code (tried to write in Javascript but do not blame if I wrote anything wrong):

function GetFirstLetterAndFilenames(allFilenames) {
    var currentLetter = null;
    var currentFilenames = null;

    for (int i = 0; i < allFilenames.length ; i++) {
        var thisLetter = allFilenames[i][0];
        if (currentLetter == null || thisLetter < currentLetter) {
            currentLetter = thisLetter;
            currentFilenames = [allFilenames[i]];
        } else if (currentLetter == thisLetter) {
            currentFilenames.push(allFilenames[i]); 
        }
    }

    return new {lowestLetter = currentLetter, filenames = currentFilenames};
}


Names have a funny way of not distributing themselves evenly over the alphabet, so you're probably not going to win by as much as you'd hope by predicting where to look.

But a really easy way to cut your search down by an average of two steps is as follows: if the letter is from a to m, binary search for the next letter. Then binary search from the beginning of the list only to the position you just found for the next letter. If the letter is from n to z, binary search for it. Then, again, only search the portion of the list after what you just found.

Is this worth saving two steps? Dunno. It's pretty easy to implement, but then again, two steps don't take very long. (Correctly guessing the letter would save you maybe 4 steps at best.)

Another possibility is to have bins for each letter to begin with. It starts out already sorted, and if you have to re-sort, you only have to sort within one letter, not the whole list. The downside is that if you need to manipulate the whole list frequently, you have to glue all the bins together.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜