开发者

pysqlite, query for duplicate entries with swapped columns

Currently I hav开发者_Python百科e a pysqlite db that I am using to store a list of road conditions. The source this list is generated from however is buggy and sometimes generates duplicates. Some of these duplicates will have the start and end points swapped but everything else the same.

The method i currently have looks like this:

def getDupes(self):
    '''This method is used to return a list of dupilicate entries 
    '''
    self.__curs.execute('SELECT * FROM roadCond GROUP BY road, start, end, cond, reason, updated, county, timestmp HAVING count(*)>1')
    result = self.__curs.fetchall()

    def getSwaps():
        '''This method is used to grab the duplicates with swapped columns
        '''
        self.__curs.execute('SELECT * FROM roadCond WHERE ')
        extra = self.__curs.fetchall()
        return extrac

    result.extend(getSwaps())

    return result

The the initial query works but I am suspicious of it (I think there is a better way, I just don't know) but I am not all to sure how to make the inner method work.

Thank you ahead of time. :-D


Instead of the first query, you could use

SELECT DISTINCT * FROM roadCond

which will retrieve all the records from the table, removing any duplicates.

As for the inner method, this query will return all the records which have "duplicates" with start and end swapped. Note that, for each record with "duplicates", this query will return both the "original" and the "copy".

SELECT DISTINCT * FROM roadCond WHERE EXISTS (
    SELECT * FROM roadCond rc2 WHERE
        roadCond.road = rc2.road AND
        roadCond.end = rc2.start AND roadCond.start = rc2.end AND
        roadCond.cond = rc2.cond AND
        ... AND
        roadCond.timestamp = rc2.timestamp)

Edit: To detect and remove "duplicates" with start and end swapped, you could make sure that your data always contains these values laid out in the same order:

UPDATE roadCond SET start = end, end = start WHERE end < start;

But this approach only works if it doesn't matter which is which.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜