What is the order of the run time for an algorithm with this desired output?
There are N sets Ai to An each with string entries开发者_StackOverflow中文版. The average size of a set is K.
For each Ai we wish to return a list (or a better data structure?) of N-1 sets excluding Ai ordered by how many elements the sets have in common with Ai?
Please don't be shy to give a detailed response with nice mathematical arguments...:)
Also is this a standard problem and can I read more about it somewhere?
Basicly you generate each result list element by performing an intersections of 2 sets. You have N-1 intersections in your result list element, that boils down to N-1 * IntersectTime. For N list elements in the result this sums up to N(N-1) * IntersectTime. Afterwards you have to order N times N-1 sets, so just for ordering them you have O(N² log N).
IntersectTime depends on the implementation of the set, for a typical hashset this is for you O(k).
So finally we have O(N²k) + O(N² log N) = O(N² (k+log N)) = (if we assume k > log N) O(N²k).
EDIT: when you would really implemnt it, it is good to know that when you intersect two sets, you can use the result for 2 of the result list elements, that means, that for the first you have to intersect A_1 with N-1, for A_2 with N-2 (intersection with A_1 was already done at for first element), for A_3 with N-3 other sets and finally for A_N with none. BUT this does not modify the big-O time, it just halfs the runtime.
Here's my attempt -
I believe you can boil the process down into: O(N * (C + S))
Where N is the number of sets, C is the amount of time it takes to compare N-1 sets to set Ai, and S is the amount of time it takes to sort the N-1 sets.
The comparison is K items to K items N-1 times, so (N-1)K^2 time to compare
Sorting should take log(n - 1) time with an efficient algorithm For simplicity, we can shorten N-1 into just N
So, the whole thing should run in O(N(NK^2 + log(N)))
You should take this with a grain of salt, I haven't done anything with algorithms for quite a while. There may also be a more efficient way to compare the sets.
精彩评论