开发者

"Unknown delivery tag" from RabbitMQ when ack'ing a message in a cluster with replicated queues

We've been using Rabbit successfully for about a year. Recently have upgraded to v2.6.1, because we want to use clusters with replicated message queues.

My testing has hit a puzzling behavior that smells like a Rabbit bug to me. The test th开发者_StackOverflowat uncovers this is working with a two-node cluster. Both nodes are running v2.6.1. Both nodes have disk. Both nodes are running on Mac OS, though I doubt this is pertinent.

I'm also running Alice on the node that runs the test. The test uses it to programmatically do a stop_app on one of the nodes, because the test is trying to validate that if the cluster master fails, and a slave is elevated to take its place, that we don't lose messages.

So, the test has a small thread pool, which is given tasks that periodically 1) publish messages, and 2) toggle the state of the Rabbit master node (stopped if running; started if stopped). Other threads are consuming messages from queues.

I'm using publisher confirms, and I'm also acknowledging the messages in the consumers (using autoAck=false for channel.basicConsume()).

When the master node is stopped, I see both the producers and consumers catching ShutdownSignalException. They handle this by attempting to reconnect to the cluster. This works fine. When reconnected, they continue with their business.

Sometimes, what I see is that a consumer has successfully fetched a message from the broker, and is calling channel.basicAck() when it gets that ShutdownSignalException.

Later, when the consumer has reconnected, it again pulls down the same message. (The message bodies are tagged with a UUID, so I know it is the same one.) This time, when the consumer attempts to basicAck() the message, it again gets ShutdownSignalException, but this one has the following text in it: "reply-text=PRECONDITION_FAILED - unknown delivery tag 7".

In fact, that is the same delivery tag that was offered to the consumer by the broker before the master went down and the consumer reconnected.

Googling suggests that this event means that the consumer is attempting to ack the same message more than once.

But, how can this be so? If the first ack succeeded, then the message should have been removed from the broker's queues, and the consumer shouldn't see the same message again.

Yet, if the first ack did not succeed, then the consumer shouldn't be dinged for attempting to re-ack the message.

Anyone seen this before? It smells like a bug in Rabbit's replicated queues to me, but I've still new to Rabbit, and so am willing to believe there's a subtlety here in consuming from a clustered broker that I haven't yet grokked!

Thanks, --Steve


I'm not sure if my case matching yours, but I have seen similar "unknown delivery tag" on attempts to ack after reconnect and then the same message arrived again. Initially it looked like a bug to me, but in fact this is expected behavior. Consumer with QOS>1 may have in it's local buffer some messages and delivery tag will be invalid for all o them after reconnect. From another hand, attempt to ack even the current message after reconnect doesn't make any sense, because that message already nacked automatically on connection lost and this is why I got it again.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜