Efficient incremental implementation of poset
I'm implementing in terms of SQLAlchemy a structure that has the mathematical characteristic of Partially Ordered Set, in which I need to be able to add and remove edges one at a time.
In my current, best design, I use two adjacency lists, one being the assignment list (approximately edges in the Hass Diagram), since I need to preserve which pairs of nodes are explicitly set as ordered, and the other adjacency list is the transitive closure of the first, so that I can efficiently query if one node is ordered with respect to another. Right now, I recompute the transitive closure each time an edge is added to or removed from the assignment adjacency list.
It looks something like this:
assignment = Table('assignment', metadata,
Column('parent', Integer, ForeignKey('node.id')),
Column('child', Integer, ForeignKey('node.id')))
closure = Table('closure', metadata,
Colu开发者_StackOverflow社区mn('ancestor', Integer, ForeignKey('node.id')),
Column('descendent', Integer, ForeignKey('node.id')))
class Node(Base):
__tablename__ = 'node'
id = Column(Integer, primary_key=True)
parents = relationship(Node, secondary=assignment,
backref='children',
primaryjoin=id == assignment.c.parent,
secondaryjoin=id == assignment.c.child)
ancestors = relationship(Node, secondary=closure,
backref='descendents',
primaryjoin=id == closure.c.ancestor,
secondaryjoin=id == closure.c.descendent,
viewonly=True)
@classmethod
def recompute_ancestry(cls.conn):
conn.execute(closure.delete())
adjacent_values = conn.execute(assignment.select()).fetchall()
conn.execute(closure.insert(), floyd_warshall(adjacent_values))
where floyd_warshall()
is an implementation of the algorithm by the same name.
This is leading me to two problems. The first is that It doesn't seem to be very efficient, but I'm not sure of what sort of algorithm I could use instead.
The second is more about the practicality of having to explicitly call Node.recompute_ancestry()
each time an assignment occurs, and only after the assignments are flushed into the session and with the proper connections. If I want to see the changes reflected in the ORM, I'd have to flush the session again. It would be much easier, I think, If I could express the recompute ancestry operation in terms of the orm.
Well, I went and worked out the solution to my own problem. The crude part of it is to apply the Floyd-Warshall algorithm on the intersection of the descendents of the ancestors of the parent node with the ancestors of the descendents of the child node, but only apply the output to the union of the parent's ancestors and child's descendents. I spent so much time on it I ended up posting the process on my blog, but here is teh codes.
from sqlalchemy import *
from sqlalchemy.orm import *
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
association_table = Table('edges', Base.metadata,
Column('predecessor', Integer,
ForeignKey('nodes.id'), primary_key=True),
Column('successor', Integer,
ForeignKey('nodes.id'), primary_key=True))
path_table = Table('paths', Base.metadata,
Column('predecessor', Integer,
ForeignKey('nodes.id'), primary_key=True),
Column('successor', Integer,
ForeignKey('nodes.id'), primary_key=True))
class Node(Base):
__tablename__ = 'nodes'
id = Column(Integer, primary_key=True)
# extra columns
def __repr__(self):
return '<Node #%r>' % (self.id,)
successors = relationship('Node', backref='predecessors',
secondary=association_table,
primaryjoin=id == association_table.c.predecessor,
secondaryjoin=id == association_table.c.successor)
before = relationship('Node', backref='after',
secondary=path_table,
primaryjoin=id == path_table.c.predecessor,
secondaryjoin=id == path_table.c.successor)
def __lt__(self, other):
return other in self.before
def add_successor(self, other):
if other in self.successors:
return
self.successors.append(other)
self.before.append(other)
for descendent in other.before:
if descendent not in self.before:
self.before.append(descendent)
for ancestor in self.after:
if ancestor not in other.after:
other.after.append(ancestor)
def del_successor(self, other):
if not self < other:
# nodes are not connected, do nothing!
return
if not other in self.successors:
# nodes aren't adjacent, but this *could*
# be a warning...
return
self.successors.remove(other)
# we buld up a set of nodes that will be affected by the removal
# we just did.
ancestors = set(other.after)
descendents = set(self.before)
# we also need to build up a list of nodes that will determine
# where the paths may be. basically, we're looking for every
# node that is both before some node in the descendents and
# ALSO after the ancestors. Such nodes might not be comparable
# to self or other, but may still be part of a path between
# the nodes in ancestors and the nodes in descendents.
ancestors_descendents = set()
for ancestor in ancestors:
ancestors_descendents.add(ancestor)
for descendent in ancestor.before:
ancestors_descendents.add(descendent)
descendents_ancestors = set()
for descendent in descendents:
descendents_ancestors.add(descendent)
for ancestor in descendent.after:
descendents_ancestors.add(ancestor)
search_set = ancestors_descendents & descendents_ancestors
known_good = set() # This is the 'paths' from the
# original algorithm.
# as before, we need to initialize it with the paths we
# know are good. this is just the successor edges in
# the search set.
for predecessor in search_set:
for successor in search_set:
if successor in predecessor.successors:
known_good.add((predecessor, successor))
# We now can work our way through floyd_warshall to resolve
# all adjacencies:
for ancestor in ancestors:
for descendent in descendents:
if (ancestor, descendent) in known_good:
# already got this one, so we don't need to look for an
# intermediate.
continue
for intermediate in search_set:
if (ancestor, intermediate) in known_good \
and (intermediate, descendent) in known_good:
known_good.add((ancestor, descendent))
break # don't need to look any further for an
# intermediate, we can move on to the next
# descendent.
# sift through the bad nodes and update the links
for ancestor in ancestors:
for descendent in descendents:
if descendent in ancestor.before \
and (ancestor, descendent) not in known_good:
ancestor.before.remove(descendent)
Update closure as you insert, and do so in terms of orm:
def add_assignment(parent, child):
"""And parent-child relationship between two nodes"""
parent.descendants += child.descendants + [child]
child.ancestors += parent.ancestors + [parent]
parent.children += child
If you need to delete assignments, this is faster in pure sql:
def del_assignment(parent, child):
parent.children.remove(child)
head = [parent.id] + [node.id for node in parent.ancestors]
tail = [child.id] + [node.id for node in child.descendants]
session.flush()
session.execute(closure.delete(), and_(
closure.c.ancestor.in_(head),
closure.c.descendant.in_(tail)))
session.expire_all()
精彩评论