Ordering an array for maximal pairwise matches
I have an array:
array([[ 4, 10],
[ 4, 2],
[ 0, 7],
[ 5, 11],
[ 6, 8],
[ 3, 6],
[ 9, 7],
[ 2, 11],
[ 9, 5],
[ 8, 1]])
I want a method by which to order the value pairs so that as many as possible pairwise 2-element sets have a common value. This is an example of the desired ordered array:
array([[ 4, 10],
[ 4, 2],
[ 2, 11],
[ 5, 11],
[ 9, 5],
[ 9, 7],
[ 0, 7], #note the gap here:
[ 8, 1],
[ 6, 8],
[ 3, 6]])
Several conditions are true about these arrays. There are no duplicate pairs (ie: [1,0] nor [0,1] will appear elsewhere in the array if [0,1] already exists). No pair has the same value (ie: [1,1] will not exist). No pair will have more than two matches (iow: no value exists more than twice in the entire array.) but a pair can have as few as zero matches (note in the above array the gap across which there is no match).
Obviously, I can create every permutation of the array, but this seems brutish. I think there might be some way of cutting the deck and re-stacking in a logical way such that it becomes sorted in a small number of cuts. But before I go down that road, I want to: 1) Be sure that there is no numpy
or collections
function that already does this. 2) Know that there is no tricky-genius way to use numpy .sort()
(or similar) to do it. 3) Find whether this is a common task and there is algorithm that does this. ("Oh, that's the Blumen-Funke algorithm!")
Here is some code that generates shuffled test arrays and checks sorted arrays:
def shuffled(N=12, ans=1):
'''returns is now the unsorted test array'''
r = range(N)
random.shuffle(r开发者_开发知识库)
l = []
for i in range(N):
l.append((r[i-1], r[i]))
random.shuffle(l)
return np.array(l)[ans+1:]
# step 2: ???
def test_ordered(a):
'''checks if generated array has been sorted'''
c0 = a[1:,0]==a[:-1,0]
c1 = a[1:,0]==a[:-1,1]
c2 = a[1:,1]==a[:-1,0]
c3 = a[1:,1]==a[:-1,1]
cond = c0+c1+c2+c3
ans = sum(numpy.logical_not(cond))
# when sorted this should return the same number input into
# shuffled() as 'ans':
return ans
(Is this a subjective question? I'm being warned that it is.)
Results:
Thank you so much for all your help. Sven's solution was about 20% faster than Paul's and happily, they both run in linear time, Doug's answer did not solve the problem. There was a high, but also largely linear dependence of performance on the number of "breaks" or "gaps" in the input data. See the plot below. The 10,000 magnitude axis is N. the 0.5 axis is the percentage of breaks. the z axis is performance in seconds.
Thanks again!
You've described a graph where the vertices are numbers, and the edges are your pairs.
Your conditions specify that each number appears once or twice in the list. This means that connected components in your graph are lines (or cycles). You can find them using this algorithm:
- [Line exists] If possible, pick a number with degree 1 (that is, it only appears once in the list). Follow the chain of pairs as far as you can, adding them to the output and removing the traversed vertices from your graph.
- [Cycle exists] If there was no number with degree 1 it means all the components are cycles. Pick any vertex (it will have degree 2). Follow the pairs as before, adding them to the output and removing the traversed vertices, but this time stop when you reach your original number.
- Repeat, until you've used up all the vertices in the graph.
You can run this algorithm efficiently: maintain a set of vertices of degree 1 and another of degree 2. When you consume an edge (a pair in your original list), modify the sets appropriately: remove an endpoint of degree 1 from the first set, and move an endpoint from the degree 2 set to the degree 1 set. Alternatively, use a priority queue.
You'll also need an efficient lookup for your pairs: build a dict from vertex to a list of adjacent vertices.
Using these ideas, you can find a best ordering in linear time (assuming O(1) set and dict implementations).
Here's a somewhat clunky implementation.
import collections
def handle_vertex(v, vs):
if v in vs[0]:
vs[0].remove(v)
elif v in vs[1]:
vs[0].add(v)
vs[1].remove(v)
def follow(edgemap, start, vs):
"""Follow a path in the graph, yielding the edges."""
v0 = start
last = None
while True:
# All vertices that we can go to next.
next_v = [v for v in edgemap[v0] if v != last]
if not next_v:
# We've reached the end (we must have been a line).
return
assert len(next_v) == 1 or (v0 == start and len(next_v) == 2)
next_v = next_v[0]
# Remove the endpoints from the vertex-degree sets.
handle_vertex(v0, vs)
handle_vertex(next_v, vs)
yield v0, next_v
if next_v == start:
# We've got back to the start (we must have been a cycle).
return
v0, last = next_v, v0
def pairsort(edges):
edgemap = collections.defaultdict(list)
original_edges = {}
for a, b in edges:
# Build the adjacency table.
edgemap[a].append(b)
edgemap[b].append(a)
# Keep a map to remember the original order pairs appeared in
# so we can output edges correctly no matter which way round
# we store them.
original_edges[a, b] = [a, b]
original_edges[b, a] = [a, b]
# Build sets of degree 1 and degree 2 vertices.
vs = [set(), set()]
for k, v in edgemap.iteritems():
vs[len(v) - 1].add(k)
# Find all components that are lines.
while vs[0]:
v0 = vs[0].pop()
for e in follow(edgemap, v0, vs):
yield original_edges[e]
# Find all components that are cycles.
while vs[1]:
v0 = vs[1].pop()
for e in follow(edgemap, v0, vs):
yield original_edges[e]
input = [
[ 4, 10],
[ 4, 2],
[ 0, 7],
[ 5, 11],
[ 6, 8],
[ 3, 6],
[ 9, 7],
[ 2, 11],
[ 9, 5],
[ 8, 1]]
print list(pairsort(input))
I am not certain that i understand every detail in your question; but if i did, then this should do what you want.
The basic idea is very simple: for two 1D arrays, you can cycle through all of the pairwise combinations by lining them up, holding one still and rolling the second one forward one increment at a time. If you use NumPy's roll function, then the value(s) that fall off the end of the array as you roll it forward just get pushed back onto the front, like a treadmill.
After that's set up, then just diff the two vectors along the correct axis and sum the 0s. Keep track of those sums (tx, below) then get index corresponding to the max value of those sums, NP.argmax(tx).
import numpy as NP
# create some data:
c1 = NP.random.randint(0, 10, 15)
c2 = NP.random.randint(0, 10, 15)
c12 = NP.concatenate((c1, c2)).reshape(15, 2)
tx = [] # to hold the indices of the various orderings
for i in range(15) :
v = NP.diff(c12, axis=0)
num_equal_neighbors = NP.sum( NP.sum(v==0, axis=0) )
tx.append(num_equal_neighbors)
c2 = NP.roll(c2, 1)
c12[:,1] = c2
Now find which ordering of the two vectors gives the most 'pairwise' matches:
best_order = NP.argmax(tx)
therefore, the desired ordering occurs when the two 1D arrays are arranged such that the second array is rolled *best_order* number of places (and the first array is left as it is )
Here is a simpler implementation of basically the same algorithm described in Paul Hankin's answer, using different data structures. It also runs in linear time.
edges = [[4, 10], [4, 2], [0, 7], [5, 11], [6, 8],
[3, 6], [9, 7], [2, 11], [9, 5], [8, 1]]
def add_edge(vertex, adj):
edge = edges[adj[0]][:]
ordered_edges.append(edge[:])
edge.remove(vertex)
new_vertex = edge[0]
new_adj = adj_edges[new_vertex]
new_adj.remove(adj[0])
del adj[0]
return new_vertex, new_adj
adj_edges = {}
for i, edge in enumerate(edges):
for vertex in edge:
adj_edges.setdefault(vertex, []).append(i)
ordered_edges = []
for vertex, adj in adj_edges.iteritems():
while len(adj) == 1:
vertex, adj = add_edge(vertex, adj)
for vertex, adj in adj_edges.iteritems():
while adj:
vertex, adj = add_edge(vertex, adj)
See also
Longest path problem :
NP complete for general graphs, or shortest path with weights negated for acyclic.
Also, try the graph shown in
Spanning tree:
longest is harder than just long.
精彩评论