开发者

Merging two arrays of a class

I have two arrays of class Record. Class Record is defined like this

class Record{
char* string; //the word string
int count; //frequency word appears
}

And these are the two arrays defined (already initialized)

Record recordarray1=new Record[9000000];  //contains 9000000 unsorted Records
Record recordarray2=new Record[8000000]  //contains 8000000 unsorted Records

the purpose is to find strings that match between the two arrays and add them to a new array where their counts are added together, and if there is a string not in the other array then just add to the new array. To do this I have tried sorting the two arrays first, (in alphabetical order by strings), then comparing recordarray2, if the string matches then advance recordarray2's index otherwise advance recordarray1's index until you find one. If you don't find it, then add it to the new array.

Unfortunately this method is WAY too slow, sorting itself takes 20+ seconds with STL sort. Is there a quicker standard method of sorting that i'm missing? 开发者_Python百科


If I've understood correctly your algorithm should take O( nlogn + mlogm [sort both arrays] + n + m [to go through the arrays and compare]).
It may not be much of an optimization but you try to sort just one of the arrays and use binary search to check if the elements of the other array are present or not. So now it should take O( n [to copy one array as the new array] + nlogn [to sort it] + mlogn [to binary search the elements of the second into the sorted new one] ).

HTH


Sorting object might be expensive, so I would try to avoid this.

One faster way might be to create an index for each array using a std::hash_map with the string as has index and the array index as value. You get two containers that can be iterated at one time. The iterator for the lesser will be advanced until you find a match or the other points to a lesser value. This will lead you to a predictable iteration count.


The possible solution is to use unordered_map. The algorithm whould be as following:

Put the first array into the map, using strings as keys and count as values. 
For each member in the second array, check it against containment in the map.
    If it exists there
        Put the record into the new array, combining counts
        Remove the record from the map
    Else
        Put the record into the new array
Iterate throug the remaining recors in the map and put the in to the new array.

The complexity of this algorithm is aproximatelty O(n+m)


I feel that sorting is not needed. You can use following algorithm.

  1. Start with the first element of recordarray1; put into the new array
  2. Search elements in recordarray2. If the element is found increment count in new array. Also set the recordarray2[N]::count to negative value; so that it will not be checked again in step 3
  3. Put all the elements from recordarray2 which doesn't have count set to negative into new array. If negative count is encountered then simply change it to positive.

Note: This algorithm doesn't take care if you have similar string elements in the same array. Also don't use string as a variable name. As it's also a typename as std::string.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜