Technology for a reliable, persistent stack

2023-01-13 08:58 问答作者：

Trying a mental reset here: I tried to create a reliable, persistent stack with MSMQ, didn't work

So in more general terms:

I have producer (a webservice, so multithreaded although "only one") / consumer (multiple processes, as many as needed) setup. The key problems are - The data needs to be consumed/processed in LIFO order (~> stack) - The data needs to be stored/handled in a reliable way (i.e. backed by a disk, message queue, whatever). Bonus points for transacti开发者_JAVA技巧on support. - Interprocess communication involved

Given the points above I struggle to find a neat solution. What I looked at:

Do it yourself Didn't really plan to do that, but initial proof of concepts for this just confirmed that this is hard (for me) and helped me get a better grasp on the many hurdles involved.
MSMQ Would be nice and easy, since it lends itself easily to "reliable", is easy to set up and already part of the target infrastructure. Unfortunately "LIFO"/"Stack" is a killer here. That seems to be impossible to do -> Bzzzt.
Database (SQL Server) I tried to look at a DB based approach, but there are lots of ugly things involved:
- I'd need to store my data as a blob (since it doesn't lend itself to a column based store easily)
- Polling a database for work just seems wrong (is it?)
- Locking with multiple consumers seems to be tricky..

Any suggestion for a technology that I should evaluate? The database based approach seems to be the most "promising" so far, but I still haven't found good examples/success stories of similar usecases.

Updates

Windows only
For now, I don't even need to do inter-machine communication (i.e. producer/consumer probably will be on one machine for now)
The key part in the question, the difficult task for me is: I cannot lose a job/message, even if all processes go down. A DB would give me that "for free", message queues can be set up to be reliable. Map/reduce, while interesting, isn't solving the core issue: How do I make sure that messages/jobs aren't lost?

I'd go with SQL Server for this.

Obviously you'd have to serialize your data to a blob, but any solution would have to do this (at least behind the scenes). You would then just have a table like CREATE TABLE Stack (Id int identity, Data varbinary(MAX))
Polling the database isn't necessary. SQL Server has a query notification service where you just give it a query and it will notify you when the results would be different. Your notification query would just be SELECT * FROM Stack
Locking is the database's problem, not yours. You would just have every consumer run a query (or stored procedure) that uses a transaction to return the most recent entry (the row with the highest Id) and delete it at the same time. If the query returns a result, process it and run it again. If the query returns no results, see #2.

Here's a sample query:

BEGIN TRANSACTION
SELECT Data FROM Stack WHERE Id = (SELECT MAX(Id) FROM Stack)
DELETE FROM Stack WHERE Id = (SELECT MAX(Id) FROM Stack)
COMMIT

Here's a more elegant version that doesn't even require an explicit transaction:

DELETE Stack
OUTPUT DELETED.Data
WHERE Id = (SELECT MAX(Id) FROM Stack)

If you want to do batch processing of 10 items at a time, you would use SQL like this:

DELETE Stack
OUTPUT DELETED.*
WHERE Id IN (SELECT TOP 10 Id FROM Stack ORDER BY Id DESC)

You should check out AMQP. I'm digging around on google atm, and unfortunately have no reason to believe that it can maintain a stack instead of a queue, but there ARE several open source implementations and aside from the FIFO vs. LIFO issue it's a good fit for what you want.

I don't think the database table is a bad idea either, as long as you don't need to scale past a couple thousand transactions per second you should be just fine.

If you're going down the DB route, you could look at Triggers. It kind of depends on how sparse your messages are and how long you can wait to process them.

For point 3, you could look at this by the SO fanatic Jon Skeet, a means of serializing data to a binary blob that can easily be dumped....

In respect to Interprocess communication - what platform are we talking about here, if it's windows communicating with other windows machines, would a WCF not be suitable? As for transaction support - most ADO.NET has transaction support (as per MSDN article), unless you are talking about a filesystem transactional support as per this blog entry, or even using the System.Transaction namespace as clarified here in respect to distributed transactions.

MapReduce sounds perfect for this and can be super scalable since it's what google uses for indexing web pages. Not sure what your preferable stack is but you may want to check out Hadoop

继续阅读：.net data-structures

Technology for a reliable, persistent stack

Updates

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

Updates

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？