Assuming a many to many relationship model in sqlalchemy with mysql, how to update data ignoring duplicates?

2023-03-16 10:23 问答作者：

I am a little stuck with sqlalchemy trying to update some data.

I have a many to many and a one to many relationship. The first is a relationship between an author and the possible spellings of his name. The second is l开发者_C百科inking authors to their written literature. A paper may have several authors and vice versa.

Assuming an author "Peter Shaw" who has already 4 papers stored and linked to him in the database. No I want to "add" a new set of 6 papers for "Peter Shaw". Unfortunately 4 of the 6 papers are already stored in the database. This is why session.commit() results in a duplicate error.

Is there a common way to avoid the duplicate errors, and to tell sqlalchemy to just fill in the holes instead of complaining about the duplicates? Neither the docus of sqlalchemy nor google could enlighten me with an explicit answer/approach, so any suggestions are well apreciated.

These are the models I am testing with:

class NameSpelling(Base):
    __tablename__ = 'name_spellings'

    id = Column(Integer, primary_key=True)
    name = Column(String(255), nullable=False, unique=True, index=True)
    authors_id = Column(Integer, ForeignKey('authors.id'))

    def __init__(self, name=None):
        self.name = name

    def __repr__(self):
        return "NameSpelling(%r)" % (self.name)

class Author(Base):
    __tablename__ = 'authors'

    id = Column(Integer, primary_key=True)
    name = Column(String(255), nullable=True, unique=True, index=True)

    papers = relationship('Paper',
                          secondary=author_paper,
                          backref='authors')

    name_spellings = relationship(NameSpelling,
                                  order_by=NameSpelling.id,
                                  backref="author",
                                  cascade="all, delete, delete-orphan")

    def __init__(self, name=None):
        self.name = name

    def __repr__(self):
        return "Authors(%r, %r)" % (self.name_spellings, self.name)


class Paper(Base):
    __tablename__ = 'papers'

    id = Column(Integer, primary_key=True)
    title = Column(String(1500), nullable=False, index=True)
    url = Column(String(255), nullable=False, unique=True, index=True)
    date = Column(Date(), nullable=True)

    def __init__(self, title=None, url=None, date=None):
        self.title = title
        self.url = url
        self.date = date

    def __repr__(self):
        return "Paper(%r)" % (self.title)

I have the exact same problem with an SQLAlchemy project. What I ended up doing (and what is likely a bad way of handling the issue), is to check the relationship collections before adding a new instance to the session and replacing the related instances with the result of session.merge(), if any.

It looks somewhat like this:

def add_instance_to_session(instance, session):
    '''
    Add instance to session, while checking for existing child instances in
    the relationship collection instance.child_list.
    '''
    def _merge_and_replace(child):
        with session.no_autoflush:
            merged_child = session.merge(child)
            if id(merged_child) != id(child):
                try:
                    session.expunge(child)
                except sqlalchemy.exc.InvalidRequestError:
                    # child wasn't in the session to begin with
                    pass
                return merged_child
            else:
                return child
    instance.child_list = map(_merge_and_replace, instance.child_list)
    session.add(instance)

This seems to work for me, but comes across as pretty bad performace-wise, especially if you have many childs. Maybe there is a better way utilizing the ON DUPLICATE KEY idiom that mySQL offers, or similar constructs.

[edit] The session.expunge() part is probably unnecessary if only the above method is used to add instances to a session, as the children cannot be in the session at this point. At least that's how I think it is...

继续阅读：duplicates many-to-many python

Assuming a many to many relationship model in sqlalchemy with mysql, how to update data ignoring duplicates?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？