pysqlite, query for duplicate entries with swapped columns
Currently I hav开发者_Python百科e a pysqlite db that I am using to store a list of road conditions. The source this list is generated from however is buggy and sometimes generates duplicates. Some of these duplicates will have the start and end points swapped but everything else the same.
The method i currently have looks like this:
def getDupes(self):
'''This method is used to return a list of dupilicate entries
'''
self.__curs.execute('SELECT * FROM roadCond GROUP BY road, start, end, cond, reason, updated, county, timestmp HAVING count(*)>1')
result = self.__curs.fetchall()
def getSwaps():
'''This method is used to grab the duplicates with swapped columns
'''
self.__curs.execute('SELECT * FROM roadCond WHERE ')
extra = self.__curs.fetchall()
return extrac
result.extend(getSwaps())
return result
The the initial query works but I am suspicious of it (I think there is a better way, I just don't know) but I am not all to sure how to make the inner method work.
Thank you ahead of time. :-D
Instead of the first query, you could use
SELECT DISTINCT * FROM roadCond
which will retrieve all the records from the table, removing any duplicates.
As for the inner method, this query will return all the records which have "duplicates" with start
and end
swapped. Note that, for each record with "duplicates", this query will return both the "original" and the "copy".
SELECT DISTINCT * FROM roadCond WHERE EXISTS (
SELECT * FROM roadCond rc2 WHERE
roadCond.road = rc2.road AND
roadCond.end = rc2.start AND roadCond.start = rc2.end AND
roadCond.cond = rc2.cond AND
... AND
roadCond.timestamp = rc2.timestamp)
Edit: To detect and remove "duplicates" with start
and end
swapped, you could make sure that your data always contains these values laid out in the same order:
UPDATE roadCond SET start = end, end = start WHERE end < start;
But this approach only works if it doesn't matter which is which.
精彩评论