开发者

Need to create a list of sets, from a list of sets whose members may be connected

I'm dealing with polygonal data in realtime here, but the problems quite simple. I have a huge list containing thousands of sets of polygon Indecies (Integers) and I need to simplify the list as "fast" as possible into a list of sets of "connected" Indecies. i.e. Any sets containing integers that are also in another set become one set in the result. I've read several possible solutions involving sets & graphs etc. All i'm after are a final list of sets which had any degree of commonality.

I'm dealing with lots of data here, but for simplicities sake here's some sample data:

setA = set([0,1,2])
setB = set([6,7,8,9])
setC = set([4,5,6])
setD = set([3,4,5,0])
setE = set([10,11,12])
se开发者_运维知识库tF = set([11,13,14,15])
setG = set([16,17,18,19])

listOfSets = [setA,setB,setC,setD,setE,setF,setG]

In this case I'm after a list with a result like this, although ordering is irrelevant:

connectedFacesListOfSets = [ set([0,1,2,3,4,5,6,7,8,9]), set([10,11,12,13,14,15]), set([16,17,18,19])]

I've looked for similar solutions, but the one with the highest votes gave incorrect results on my large test data.

Merge lists that share common elements


It's hard to tell the performance without a sufficiently large set, but here is some basic code to start from:

while True:
    merged_one = False
    supersets = [listOfSets[0]]

    for s in listOfSets[1:]:
        in_super_set = False
        for ss in supersets:
            if s & ss:
               ss |= s
               merged_one = True
               in_super_set = True
               break

        if not in_super_set:
            supersets.append(s)

    print supersets
    if not merged_one:
        break

    listOfSets = supersets       

This works in 3 iterations on the provided data. And the output is as follows:

[set([0, 1, 2, 3, 4, 5]), set([4, 5, 6, 7, 8, 9]), set([10, 11, 12, 13, 14, 15]), set([16, 17, 18, 19])]
[set([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), set([10, 11, 12, 13, 14, 15]), set([16, 17, 18, 19])]
[set([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), set([10, 11, 12, 13, 14, 15]), set([16, 17, 18, 19])]


This is a union find problem.

Though I haven't used it, this Python code looks good to me.

http://code.activestate.com/recipes/577225-union-find/


Forgive the messed up caps (autocorrect...):

# the results cotainer
Connected = set()

sets = # some list of sets

# convert the sets to frozensets (which are hashable and can be added to sets themselves)
Sets = map(frozenset, sets)

for s1 in sets:
    Res = copy.copy(s1)
    For s2 in sets:
        If s1 & s2:
            Res = res | s2
    Connected.add(res)  


So.. I think I got it. It's a mess but I got it. Here's what I did:

def connected_valid(li):
    for i, l in enumerate(li):
        for j, k in enumerate(li):
            if i != j and contains(l,k):
                return False
    return True

def contains(set1, set2):
    for s in set1:
        if s in set2:
            return True
    return False

def combine(set1, set2):
    set2 |= set1
    return set2

def connect_sets(li):
    while not connected_valid(li):
        s1 = li.pop(0)
        s2 = li[0]
        if contains(s1, s2):
            li[0] = combine(s1,s2)
        else:
            li.append(s1)
    return li

Then in the main function you'd do something like this:

setA = set([0,1,2])
setB = set([6,7,8,9])
setC = set([4,5,6])
setD = set([3,4,5,0])
setE = set([10,11,12])
setF = set([11,13,14,15])
setG = set([16,17,18,19])

connected_sets = connect_sets([setA,setB,setC,setD,setE,setF,setG,])

After running it, I got the following output

print connected_sets
[set([0,1,2,3,4,5,6,7,8,9]), set([10,11,12,13,14,15]), set([16,17,18,19])]

Hope that's what you're looking for.

EDIT: Added code to randomly generate sets:

# Creates a list of 4000 sets with a random number of values ranging from 0 to 20000
sets = []
ma = 0
mi = 21000
for x in range(4000):
    rand_num = sample(range(20),1)[0]
    tmp_set_li = sample(range(20000), rand_num)
    sets.append(set(tmp_set_li))

The last 3 lines can be condensed into one if you really wanted to.


I tried to do something different: this algorithm loops once for each set and once for each element:

# Our test sets
setA = set([0,1,2])
setB = set([6,7,8,9])
setC = set([4,5,6])
setD = set([3,4,5,0])
setE = set([10,11,12])
setF = set([11,13,14,15])
setG = set([16,17,18,19])

list_of_sets = [setA,setB,setC,setD,setE,setF,setG]

# We will use a map to store our new merged sets.
# This map will work as an reference abstraction, so it will
# map set ids to the set or to other set id.
# This map may have an indirection level greater than 1
merged_sets = {}

# We will also use a map between indexes and set ids.
index_to_id = {}

# Given a set id, returns an equivalent set id that refers directly
# to a set in the merged_sets map
def resolve_id(id):
    if not isinstance(id, (int, long)):
        return None
    while isinstance(merged_sets[id], (int, long)):
        id = merged_sets[id]
    return id


# Points the informed set to the destination id
def link_id(id_source, id_destination):
    point_to = merged_sets[id_source]
    merged_sets[id_source] = id_destination
    if isinstance(point_to, (int, long)):
        link_id(point_to, id_destination)


empty_set_found = False
# For each set
for current_set_id, current_set in enumerate(list_of_sets):
    if len(current_set) == 0 and empty_set_found:
        continue
    if len(current_set) == 0:
        empty_set_found = True
    # Create a set id for the set and place it on the merged sets map
    merged_sets[current_set_id] = current_set
    # For each index in the current set
    possibly_merged_current_set = current_set
    for index in current_set:
        # See if the index is free, i.e., has not been assigned to any set id
        if index not in index_to_id:
            # If it is free, then assign the set id to the index
            index_to_id[index] = current_set_id
            # ... and then go to the next index
        else:
            # If it is not free, then we may need to merge the sets
            # Find out to which set we need to merge the current one,
            # ... dereferencing if necessary
            id_to_merge = resolve_id(index_to_id[index])
            # First we check to see if the assignment is to the current set or not
            if id_to_merge == resolve_id(merged_sets[current_set_id]):
                continue
            # Merge the current set to the one found
            print 'Merging %d with %d' % (current_set_id, id_to_merge)
            merged_sets[id_to_merge] |= possibly_merged_current_set
            possibly_merged_current_set = merged_sets[id_to_merge]
            # Map the current set id to the set id of the merged set
            link_id(current_set_id, id_to_merge)
# Return all the sets in the merged sets map (ignore the references)
print [x for x in merged_sets.itervalues() if not isinstance(x, (int, long))]

It prints:

Merging 2 with 1
Merging 3 with 0
Merging 3 with 1
Merging 5 with 4
[set([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), set([10, 11, 12, 13, 14, 15]), set([16, 17, 18, 19])]
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜