Concurrency issue with database queue, need portable solution
I've got an incoming message queue implemented in 开发者_如何学Goa SQL database (currently MS SQL Server), accessible via a web service. This web service runs a legacy protocol we have no control over, and must return either a 'no messages in queue' response, or both:
- Details of the oldest message in the queue
- The number of other messages in the queue
The algorithm used for this is currently very crude:
- Select all messages in queue
- Number of messages remaining = row count - 1
- Update oldest message to set collected flag
We are moving to a scenario where this may now be concurrently accessed, and have obviously run into concurrency problems where the same message gets picked up twice.
The issue we have is that we need a portable solution with minimal deployment requirements. Ideally that means no stored procedures, and no engine-specific locks.
Does anyone have any bright ideas?
Thanks!
Since you are already using SQL server you should look into using SQL Server Service Broker.
However this ties you to SQL server...
One thing you might consider is you do a select... get a ROW ID ... update the ROW to mark it as collected with a new rowID (guid or such) that you generated... the select it AGAIN... if the row id is the one you set then you are the one who got it... if not someone got in before you and you have to get go get another.
Basically you are implementing dirty read protection yourself... because you don't want to use a transaction or stored proc...
Ultimately, given the constraints you describe, dequeuing must be done one message at time - however there's a lot more involved in getting the message to the subscriber than just the database transaction (all the webservice handling code, network and HTTP latency).
The current system does not obviously implement much in the way of transactional control - so I assume that's not an issue for the concurrent solution either.
If it were me I'd set up a stored procedure (why do you think they're not portable?) with the operations in a transaction with a serializable lock - I'd also cache the number of queue messages with a TTL based on the number of operations / age of the cached value to reduce the number of costly aggregate operations on the underlying table. This could be done by issuing DML an operation at a time from the controllnig language - but the net effect is the operation will take significantly longer.
An alternative approach (which makes implementing locking more easy) would be to shard the data - not sure if MSSQL supports named locks - certainly its not a very portable solution - so splitting the data across seperate tables and using table locks. This makes implementing transactional control a lot simpler - but there are problems with load-blanacing / in-sequence processing (but again, the concurrent approach applies that the latter is not a big issue).
精彩评论