开发者

Algorithm to merge two lists lacking comparison between them

I am looking for an algorithm to merge two sorted lists, but they lack a comparison operator between elements of one list and elements of the other. The resulting merged list may not be unique, but any result which satisfies the relative sort order of each list will do. More precisely:

Given:

  • Lists A = {a_1, ..., a_m}, and B = {b_1, ..., b_n}. (They may be considered sets, as well).
  • A precedence operator < defined among elements of each list such that a_i < a_{i+1}, and b_j < b_{j+1} for 1 <= i <= m and 1 <= j <= n.
  • The precedence operator is undefined between elements of A and B: a_i < b_j is not defined for any valid i and j.
  • An equality operator = defined among all elements of either A or B (it is defined between an element from A and an element from B).
  • No two elements from list A are equal, and the same holds for list B.

Produce: A list C = {c_1, ..., c_r} such that:

  • C = union(A, B); the elements of C are the union of elements from A and B.
  • If c_p = a_i, c_q = a_j, and a_i < a_j, then c_p < c_q. (The order of elements of the sublists of C corresponding to sets A and B should be preserved.
  • There exist no i and j such that c_i = c_j. (all duplicated elements between A and B are removed).

I hope this question makes sense and that I'm not asking something either terribly obvious, or something for which there is no solution.

Context:

A constructible number can be represented exactly in finitely many quadratic extensions to the field of rational numbers (using a binary tree of height equal to the number of field extensions). A representation of a constructible number must therefore "know" the field it is represented in. Lists A and B represent successive quadratic extensions of the rational numbers. Elements of A and B themselves are constructible numbers, which are defined in the context of previous smaller fields (hence the precedence operator). When adding/multiplying constructible numbers, the quadratically extended fields must first be merged so that the binary arithmetic operations can be performed; the resulting list C is the quadratically extended field which can represent numbers representable by both fields A and B. (If anyone has a better idea of how to programmatically work with constructible numbers, let me know. A question concerning constructible numbers has arisen before, and also here are some interesting responses about their representation.)

Before anyone asks, no, this question does not belong on mathoverflow; they hate algorithm (and generally non-graduate-level math) questions.

In practice, lists A and B are linked lists (stored in reverse order, actually). I will also need to keep track of which elements of C corresponded to which in A and B, but that is a minor detail. The algorithm I seek is not the merge operation in mergesort, because the precedence operator is not defined between elements of the two lists being merged. Everything will eventually be implemented in C++ (I just want the operator over开发者_C百科loading). This is not homework, and will eventually be open sourced, FWIW.


I don't think you can do it better than O(N*M), although I'd be happy to be wrong.

That being the case, I'd do this:

  • Take the first (remaining) element of A.
  • Look for it in (what's left of) B.
    • If you don't find it in B, move it to the output
    • If you do find it in B, move everything from B up to and including the match, and drop the copy from A.
  • Repeat the above until A is empty
  • Move anything left in B to the output

If you want to detect incompatible orderings of A and B, then remove "(what's left of)" from step 2. Search the whole of B, and raise an error if you find it "too early".

The problem is that given a general element of A, there is no way to look for it in B in better than linear time (in the size of B), because all we have is an equality test. But clearly we need to find the matches somehow and (this is where I wave my hands a bit, I can't immediately prove it) therefore we have to check each element of A for containment in B. We can avoid a bunch of comparisons because the orders of the two sets are consistent (at least, I assume they are, and if not there's no solution).

So, in the worst case the intersection of the lists is empty, and no elements of A are order-comparable with any elements of B. This requires N*M equality tests to establish, hence the worst-case bound.

For your example problem A = (1, 2, c, 4, 5, f), B = (a, b, c, d, e, f), this gives the result (1,2,a,b,c,4,5,d,e,f), which seems good to me. It performs 24 equality tests in the process (unless I can't count): 6 + 6 + 3 + 3 + 3 + 3. Merging with A and B the other way around would yield (a,b,1,2,c,d,e,4,5,f), in this case with the same number of comparisons, since the matching elements just so happen to be at equal indices in the two lists.

As can be seen from the example, the operation can't be repeated. merge(A,B) results in a list with an order inconsistent with that of merge(B,A). Hence merge((merge(A,B),merge(B,A)) is undefined. In general, the output of a merge is arbitrary, and if you go around using arbitrary orders as the basis of new complete orders, you will generate mutually incompatible orders.


This sounds like it would use a degenerate form of topological sorting.

EDIT 2:

Now with a combined routine:

import itertools

list1 = [1, 2, 'c', 4, 5, 'f', 7]
list2 = ['a', 'b', 'c', 'd', 'e', 'f', 'g']

ibase = 0
result = []

for n1, i1 in enumerate(list1):
  for n2, i2 in enumerate(itertools.islice(list2, ibase, None, 1)):
    if i1 == i2:
      result.extend(itertools.islice(list2, ibase, ibase + n2))
      result.append(i2)
      ibase = n2 + ibase + 1
      break
  else:
    result.append(i1)
result.extend(itertools.islice(list2, ibase, None, 1))

print result


Would concatenating the two lists be sufficient? It does preserve the relative sortedness of elements from a and elements from b.

Then it's just a matter of removing duplicates.

EDIT: Alright, after the comment discussion (and given the additional condition that a_i=b_i & a_j=b_j & a_i<a_j => b_i<b-J), here's a reasonable solution:

  1. Identify entries common to both lists. This is O(n2) for the naive algorithm - you might be able to improve it.

  2. (optional) verify that the common entries are in the same order in both lists.

  3. Construct the result list: All elements of a that are before the first shared element, followed by all elements of b before the first shared element, followed by the first shared element, and so on.


Given the problem as you have expressed it, I have a feeling that the problem may have no solution. Suppose that you have two pairs of elements {a_1, b_1} and {a_2, b_2} where a_1 < a_2 in the ordering of A, and b_1 > b_2 in the ordering of B. Now suppose that a_1 = b_1 and a_2 = b_2 according to the equality operator for A and B. In this scenario, I don't think you can create a combined list that satisfies the sublist ordering requirement.

Anyway, there's an algorithm that should do the trick. (Coded in Java-ish ...)

List<A> alist = ...
List<B> blist = ...
List<Object> mergedList = new SomeList<Object>(alist);
int mergePos = 0;
for (B b : blist) {
    boolean found = false;
    for (int i = mergePos; i < mergedList.size(); i++) {
        if (equals(mergedList.get(i), b)) {
            found = true; break;
        }
    }
    if (!found) {
        mergedList.insertBefore(b, mergePos);
        mergePos++;
    }
}

This algorithm is O(N**2) in the worst case, and O(N) in the best case. (I'm skating over some Java implementation details ... like combining list iteration and insertion without a major complexity penalty ... but I think it can be done in this case.)

The algorithm neglects the pathology I mentioned in the first paragraph and other pathologies; e.g. that an element of B might be "equal to" multiple elements of A, or vice versa. To deal with these, the algorithm needs to check each b against all elements of the mergedList that are not instances of B. That makes the algorithm O(N**2) in the best case.


If the elements are hashable, this can be done in O(N) time where N is the total number of elements in A and B.

def merge(A, B):
    # Walk A and build a hash table mapping its values to indices.
    amap = {}
    for i, a in enumerate(A):
        amap[a] = i

    # Now walk B building C.
    C = []
    ai = 0
    bi = 0
    for i, b in enumerate(B):
        if b in amap:
            # b is in both lists.
            new_ai = amap[b]
            assert new_ai >= ai  # check for consistent input
            C += A[ai:new_ai]    # add non-shared elements from A
            C += B[bi:i]         # add non-shared elements from B
            C.append(b)          # add the shared element b
            ai = new_ai + 1
            bi = i + 1
    C += A[ai:]  # add remaining non-shared elements from A
    C += B[bi:]  # from B
    return C

A = [1, 2, 'c', 4, 5, 'f', 7]
B = ['a', 'b', 'c', 'd', 'e', 'f', 'g']
print merge(A, B)

(This is just an implementation of Anon's algorithm. Note that you can check for inconsistent input lists without hurting performance and that random access into the lists is not necessary.)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜