Need to create a list of sets, from a list of sets whose members may be connected
I'm dealing with polygonal data in realtime here, but the problems quite simple. I have a huge list containing thousands of sets of polygon Indecies (Integers) and I need to simplify the list as "fast" as possible into a list of sets of "connected" Indecies. i.e. Any sets containing integers that are also in another set become one set in the result. I've read several possible solutions involving sets & graphs etc. All i'm after are a final list of sets which had any degree of commonality.
I'm dealing with lots of data here, but for simplicities sake here's some sample data:
setA = set([0,1,2])
setB = set([6,7,8,9])
setC = set([4,5,6])
setD = set([3,4,5,0])
setE = set([10,11,12])
se开发者_运维知识库tF = set([11,13,14,15])
setG = set([16,17,18,19])
listOfSets = [setA,setB,setC,setD,setE,setF,setG]
In this case I'm after a list with a result like this, although ordering is irrelevant:
connectedFacesListOfSets = [ set([0,1,2,3,4,5,6,7,8,9]), set([10,11,12,13,14,15]), set([16,17,18,19])]
I've looked for similar solutions, but the one with the highest votes gave incorrect results on my large test data.
Merge lists that share common elements
It's hard to tell the performance without a sufficiently large set, but here is some basic code to start from:
while True:
merged_one = False
supersets = [listOfSets[0]]
for s in listOfSets[1:]:
in_super_set = False
for ss in supersets:
if s & ss:
ss |= s
merged_one = True
in_super_set = True
break
if not in_super_set:
supersets.append(s)
print supersets
if not merged_one:
break
listOfSets = supersets
This works in 3 iterations on the provided data. And the output is as follows:
[set([0, 1, 2, 3, 4, 5]), set([4, 5, 6, 7, 8, 9]), set([10, 11, 12, 13, 14, 15]), set([16, 17, 18, 19])]
[set([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), set([10, 11, 12, 13, 14, 15]), set([16, 17, 18, 19])]
[set([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), set([10, 11, 12, 13, 14, 15]), set([16, 17, 18, 19])]
This is a union find problem.
Though I haven't used it, this Python code looks good to me.
http://code.activestate.com/recipes/577225-union-find/
Forgive the messed up caps (autocorrect...):
# the results cotainer
Connected = set()
sets = # some list of sets
# convert the sets to frozensets (which are hashable and can be added to sets themselves)
Sets = map(frozenset, sets)
for s1 in sets:
Res = copy.copy(s1)
For s2 in sets:
If s1 & s2:
Res = res | s2
Connected.add(res)
So.. I think I got it. It's a mess but I got it. Here's what I did:
def connected_valid(li):
for i, l in enumerate(li):
for j, k in enumerate(li):
if i != j and contains(l,k):
return False
return True
def contains(set1, set2):
for s in set1:
if s in set2:
return True
return False
def combine(set1, set2):
set2 |= set1
return set2
def connect_sets(li):
while not connected_valid(li):
s1 = li.pop(0)
s2 = li[0]
if contains(s1, s2):
li[0] = combine(s1,s2)
else:
li.append(s1)
return li
Then in the main function you'd do something like this:
setA = set([0,1,2])
setB = set([6,7,8,9])
setC = set([4,5,6])
setD = set([3,4,5,0])
setE = set([10,11,12])
setF = set([11,13,14,15])
setG = set([16,17,18,19])
connected_sets = connect_sets([setA,setB,setC,setD,setE,setF,setG,])
After running it, I got the following output
print connected_sets
[set([0,1,2,3,4,5,6,7,8,9]), set([10,11,12,13,14,15]), set([16,17,18,19])]
Hope that's what you're looking for.
EDIT: Added code to randomly generate sets:
# Creates a list of 4000 sets with a random number of values ranging from 0 to 20000
sets = []
ma = 0
mi = 21000
for x in range(4000):
rand_num = sample(range(20),1)[0]
tmp_set_li = sample(range(20000), rand_num)
sets.append(set(tmp_set_li))
The last 3 lines can be condensed into one if you really wanted to.
I tried to do something different: this algorithm loops once for each set and once for each element:
# Our test sets
setA = set([0,1,2])
setB = set([6,7,8,9])
setC = set([4,5,6])
setD = set([3,4,5,0])
setE = set([10,11,12])
setF = set([11,13,14,15])
setG = set([16,17,18,19])
list_of_sets = [setA,setB,setC,setD,setE,setF,setG]
# We will use a map to store our new merged sets.
# This map will work as an reference abstraction, so it will
# map set ids to the set or to other set id.
# This map may have an indirection level greater than 1
merged_sets = {}
# We will also use a map between indexes and set ids.
index_to_id = {}
# Given a set id, returns an equivalent set id that refers directly
# to a set in the merged_sets map
def resolve_id(id):
if not isinstance(id, (int, long)):
return None
while isinstance(merged_sets[id], (int, long)):
id = merged_sets[id]
return id
# Points the informed set to the destination id
def link_id(id_source, id_destination):
point_to = merged_sets[id_source]
merged_sets[id_source] = id_destination
if isinstance(point_to, (int, long)):
link_id(point_to, id_destination)
empty_set_found = False
# For each set
for current_set_id, current_set in enumerate(list_of_sets):
if len(current_set) == 0 and empty_set_found:
continue
if len(current_set) == 0:
empty_set_found = True
# Create a set id for the set and place it on the merged sets map
merged_sets[current_set_id] = current_set
# For each index in the current set
possibly_merged_current_set = current_set
for index in current_set:
# See if the index is free, i.e., has not been assigned to any set id
if index not in index_to_id:
# If it is free, then assign the set id to the index
index_to_id[index] = current_set_id
# ... and then go to the next index
else:
# If it is not free, then we may need to merge the sets
# Find out to which set we need to merge the current one,
# ... dereferencing if necessary
id_to_merge = resolve_id(index_to_id[index])
# First we check to see if the assignment is to the current set or not
if id_to_merge == resolve_id(merged_sets[current_set_id]):
continue
# Merge the current set to the one found
print 'Merging %d with %d' % (current_set_id, id_to_merge)
merged_sets[id_to_merge] |= possibly_merged_current_set
possibly_merged_current_set = merged_sets[id_to_merge]
# Map the current set id to the set id of the merged set
link_id(current_set_id, id_to_merge)
# Return all the sets in the merged sets map (ignore the references)
print [x for x in merged_sets.itervalues() if not isinstance(x, (int, long))]
It prints:
Merging 2 with 1
Merging 3 with 0
Merging 3 with 1
Merging 5 with 4
[set([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), set([10, 11, 12, 13, 14, 15]), set([16, 17, 18, 19])]
精彩评论