Assuming a many to many relationship model in sqlalchemy with mysql, how to update data ignoring duplicates?
I am a little stuck with sqlalchemy trying to update some data.
I have a many to many and a one to many relationship. The first is a relationship between an author and the possible spellings of his name. The second is l开发者_C百科inking authors to their written literature. A paper may have several authors and vice versa.
Assuming an author "Peter Shaw" who has already 4 papers stored and linked to him in the database. No I want to "add" a new set of 6 papers for "Peter Shaw". Unfortunately 4 of the 6 papers are already stored in the database. This is why session.commit()
results in a duplicate error.
Is there a common way to avoid the duplicate errors, and to tell sqlalchemy to just fill in the holes instead of complaining about the duplicates? Neither the docus of sqlalchemy nor google could enlighten me with an explicit answer/approach, so any suggestions are well apreciated.
These are the models I am testing with:
class NameSpelling(Base):
__tablename__ = 'name_spellings'
id = Column(Integer, primary_key=True)
name = Column(String(255), nullable=False, unique=True, index=True)
authors_id = Column(Integer, ForeignKey('authors.id'))
def __init__(self, name=None):
self.name = name
def __repr__(self):
return "NameSpelling(%r)" % (self.name)
class Author(Base):
__tablename__ = 'authors'
id = Column(Integer, primary_key=True)
name = Column(String(255), nullable=True, unique=True, index=True)
papers = relationship('Paper',
secondary=author_paper,
backref='authors')
name_spellings = relationship(NameSpelling,
order_by=NameSpelling.id,
backref="author",
cascade="all, delete, delete-orphan")
def __init__(self, name=None):
self.name = name
def __repr__(self):
return "Authors(%r, %r)" % (self.name_spellings, self.name)
class Paper(Base):
__tablename__ = 'papers'
id = Column(Integer, primary_key=True)
title = Column(String(1500), nullable=False, index=True)
url = Column(String(255), nullable=False, unique=True, index=True)
date = Column(Date(), nullable=True)
def __init__(self, title=None, url=None, date=None):
self.title = title
self.url = url
self.date = date
def __repr__(self):
return "Paper(%r)" % (self.title)
I have the exact same problem with an SQLAlchemy project. What I ended up doing (and what is likely a bad way of handling the issue), is to check the relationship collections before adding a new instance to the session and replacing the related instances with the result of session.merge(), if any.
It looks somewhat like this:
def add_instance_to_session(instance, session):
'''
Add instance to session, while checking for existing child instances in
the relationship collection instance.child_list.
'''
def _merge_and_replace(child):
with session.no_autoflush:
merged_child = session.merge(child)
if id(merged_child) != id(child):
try:
session.expunge(child)
except sqlalchemy.exc.InvalidRequestError:
# child wasn't in the session to begin with
pass
return merged_child
else:
return child
instance.child_list = map(_merge_and_replace, instance.child_list)
session.add(instance)
This seems to work for me, but comes across as pretty bad performace-wise, especially if you have many childs. Maybe there is a better way utilizing the ON DUPLICATE KEY idiom that mySQL offers, or similar constructs.
[edit] The session.expunge() part is probably unnecessary if only the above method is used to add instances to a session, as the children cannot be in the session at this point. At least that's how I think it is...
精彩评论