Global Temporary Tables - locking rows + Concurrency question
I have a list of 100 en开发者_C百科tries that I want to process with multiple threads. Each thread will take up to 20 entries to process.
I'm currently using global temp tables to store the entries that meet certain criteria -- I also do not want threads to overlap entries to process.
How do I do this (preventing the overlap)?
Thanks!
If on 11g, I'd use the SELECT ... FOR UPDATE SKIP LOCKED
.
If on a previous version, I'd use Advanced Queuing to populate a queue with the primary key values of the entries to be processed, and have your threads dequeue those keys to process those records. Because the dequeue can (but doesn't have to be, if memory serves) within the processing transactional scope, the dequeue commits or rolls back with the processing, and no two threads can get the same records to process.
There are two issues here, so let's handle them separately:
How do you split some work among several threads/sessions?
You could use Advanced Queuing or the SKIP LOCKED feature as suggested by Adam.
You could also use a column that contains processing information, for example a STATE column that is empty when not processed. Each thread would start work on a row with:
UPDATE your_table
SET state='P'
WHERE STATE IS NULL
AND rownum = 1
RETURNING id INTO :id;
At this point the thread would commit to prevent other thread being locked. Then you would do your processing and select another row when you're done.
Alternatively, you could also split the work beforehand and assign each process with a range of ids that need to be processed.
How will temporary tables behave with multiple threads?
Most likely each thread will have its own Oracle session (else you couldn't run queries in parallel). This means that each thread will have its own virtual copy of the temporary table. If you stored data in this table beforehand, the threads will not be able to see it (the temp table will always be empty at the beginning of a session).
You will need regular tables if you want to store data accessible to multiple sessions. Temporary tables are fine for storing data that is private to a single session, for example intermediate data in a complex process.
Easiest will be to use DBMS_SCHEDULER to schedule a job for every row that you want to process. You have to pass a key to a permanent table to identifiy the row that you want to process, or put the full row in the arguments for the job, since a temporary table's content is not visible in different sessions. The number of concurrent jobs are controlled by resource manager, mostly limited by the number of cpus.
Why would you want to process row by row anyway? Set operations are in most occasions a lot faster.
精彩评论