开发者

Examining then updating rows in sqlite with Python

I've decided that I'm not fond of the automatic filing and renaming provided by Mendeley, but have found that it keeps track of everything in an sqlite database that I can easily read and modify from Python.

My question is, if I'm going to iterate over the rows of a table containing file paths and hashes that are serving as identifiers that are used elsewhere in the database, what is an appropriate pattern to use to update the paths as I move along, check if the file needs renaming, and the开发者_StackOverflow中文版n want to update the row in the db as I move the file.

I started putting together some Python to do this, but it seems to me that I probably need to do a query upfront to get all the rows I'm going to later iterate through, and then subsequently do REPLACE or UPDATE statements for files I've decided to move.

I'm not familiar with the internals of the sqlite3 module, but I presume it would be a bad idea to do REPLACE/UPDATEs while iterating on the cursor used for the original select.

Is there another way to do this that doesn't require going back with REPLACE/UPDATEs? I do need to do queries from other tables, using the file hash to get some other metadata in order to construct the rename.


I don't think there's a way to modify the rows in the table other than via SQL -- UPDATE, or INSERT OR REPLACE. (It's tempting to think that the sqlite3.Row object might allow assignment, and write back to the row it represents, but no.)

As for doing UPDATEs (via a 2nd cursor) while there's an outstanding cursor used for the original SELECT -- you ask if this is a bad idea -- I don't know if you're worried about correctness or performance (correctness: the cursor gets confused and doesn't iterate all the rows it should exactly once; performance: the cursor does iterate all the rows it should exactly once but there are a bunch of expensive extra queries)?

WRT correctness, this seems to work fine -- I just did a quick test where I created two cursors, c1 and c2, then on c1 executed a SELECT command, then on c2 executed an UPDATE command affecting the same rows, then looked at the result of c1.fetchall(). It still contained all the right rows, though the data in the first row was stale (probably fetched with the initial query) and the data in the later rows was updated (probably fetched only with the fetchall after the update).

(Also, if the UPDATE on c2 affects which records would be returned by the SELECT on c1, such records will no longer be fetched, but obviously anything already fetched was already fetched, and again the cursor.execute() seems to fetch the first record immediately, so using the same command order as above and with an UPDATE command that would make the original SELECT find nothing, the later c1.fetchall() still returns one record.)

As for performance, I don't know -- from the preceding paragraph it looks to me like it has to repeat the query, which may or may not be expensive depending on the query. (It may not be literally repeating it; perhaps this work is deferred till the actual fetch anyway.)

Short story: I don't think there's another way to do this without UPDATE or REPLACE, and I think it will work ok to do those UPDATEs even while iterating on the original cursor.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜