how to lock some row as they don't be selected in other transaction

2023-02-05 13:38 问答作者：

I've a table which is something like a list of URL I want to visit. The table is not referenced nor references other tables. What my application do is:

select some rows from the list of URL
start a cycle on them
- start a transacion
- visit the url
- elaborate it
- start a sub-transaction
  - check if the results are already in the first two tables (select)
  - if not, save it (insert)
- commit th开发者_运维技巧e sub-transaction
- start a sub-transaction
  - check if the results are already in another table (select)
  - if not, save it (insert)
- commit the sub-transaction
- update the row I'm visiting
- commit the main transaction
end the cycle

There is plenty of error checks here and there, the main transaction has hundreds of queries (select and insert), mysql go very high on CPU (i guess because of big rollback log), but all this is working fine.

Only I can't run more than one instance of this batch because the rows it selects are more or less the same: that means I visit an URL more than once in a few seconds, which I don't want.

If I move the start of the main transaction outside the cycle and select the rows for update, still I don't get multi concurrency because the second instance won't run the select until the main transaction of the first instance won't commit.

A possible solution is to add a "locked" field to the first table to be set to true (actually to current date as I try not to use booleans).

Another is to start the main transaction and then select just one row (for update) at once (setting "limit 1" instead of 5 or 10 as of now).

I cannot imagine other way to get what I want: don't select locked rows.

Any ideas?

It sounds as though you do need some form of marker to identify rows as "in use" so the other instances do not process the same data; whether you use a boolean or date type is irrelevant, somehow you must mark the rows in use.

You can either do this via a dispatcher, a process or thread with sole access to your table and who's only job is to select rows and pass them to other processes to work on. Even then the dispatcher will have to know how far through the data they have got so you are back to the same problem.

Another way is to use a field to indicate the row is in use (as you have said in your question). Each process updates a block of rows with a unique ID, performed inside a transaction to lock the table; I would use the connection number returned from CONNECTION_ID() to mark them, then you know it is unique.

After the UPDATE ... WHERE connection_id IS NULL (with a limit applied) transaction is complete the process can SELECT ... WHERE connection_id = CONNECTION_ID() to get their rows for processing.

When they have completed their work the whole cycle starts again to mark the next set of rows until all have been processed.

继续阅读：concurrency locking sql transactions

how to lock some row as they don't be selected in other transaction

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？