Merging two arrays of a class
I have two arrays of class Record
. Class Record
is defined like this
class Record{
char* string; //the word string
int count; //frequency word appears
}
And these are the two arrays defined (already initialized)
Record recordarray1=new Record[9000000]; //contains 9000000 unsorted Records
Record recordarray2=new Record[8000000] //contains 8000000 unsorted Records
the purpose is to find strings that match between the two arrays and add them to a new array where their counts are added together, and if there is a string not in the other array then just add to the new array. To do this I have tried sorting the two arrays first, (in alphabetical order by strings), then comparing recordarray2
, if the string matches then advance recordarray2
's index otherwise advance recordarray1
's index until you find one. If you don't find it, then add it to the new array.
Unfortunately this method is WAY too slow, sorting itself takes 20+ seconds with STL sort. Is there a quicker standard method of sorting that i'm missing? 开发者_Python百科
If I've understood correctly your algorithm should take O( nlogn + mlogm
[sort both arrays] + n + m
[to go through the arrays and compare])
.
It may not be much of an optimization but you try to sort just one of the arrays and use binary search to check if the elements of the other array are present or not. So now it should take O( n
[to copy one array as the new array] + nlogn
[to sort it] + mlogn
[to binary search the elements of the second into the sorted new one] )
.
HTH
Sorting object might be expensive, so I would try to avoid this.
One faster way might be to create an index for each array using a std::hash_map with the string as has index and the array index as value. You get two containers that can be iterated at one time. The iterator for the lesser will be advanced until you find a match or the other points to a lesser value. This will lead you to a predictable iteration count.
The possible solution is to use unordered_map
. The algorithm whould be as following:
Put the first array into the map, using strings as keys and count as values.
For each member in the second array, check it against containment in the map.
If it exists there
Put the record into the new array, combining counts
Remove the record from the map
Else
Put the record into the new array
Iterate throug the remaining recors in the map and put the in to the new array.
The complexity of this algorithm is aproximatelty O(n+m)
I feel that sorting is not needed. You can use following algorithm.
- Start with the first element of
recordarray1
; put into the new array - Search elements in
recordarray2
. If the element is found incrementcount
in new array. Also set therecordarray2[N]::count
to negative value; so that it will not be checked again in step 3 - Put all the elements from
recordarray2
which doesn't have count set to negative into new array. If negativecount
is encountered then simply change it to positive.
Note: This algorithm doesn't take care if you have similar string elements in the same array. Also don't use string
as a variable name. As it's also a typename as std::string
.
精彩评论