what are the risks and ramifications of changing the document validation criteria in a running Couch database?
To take the simplest possible example:
- Start with an empty database.
- Add a document
- Add a design document with validation function that rejects everything
- Replicate that database.
To ask a concrete question to begin with, one with an answer that I hope can be given very quickly by pointing me to the right url: is the result of this replication defined by some rule, for example that the documents are always replicated in the order they were saved, or does the successful replication of the first document depend on whether the design document happened to arrive at the destination first? In the quick experiment I did, both documents did get successfully validated, but I'm trying to find out if that outcome is defined in a spec somewhere or it's implementation dependent.
To ask a followup question that's more handwavey and may not have a single answer, what else can happen and what sorts of solutions have emerged to manage those problems? It's obviously possible for different servers to simultaneously (and I use that word hesitantly) have different versions of a validation function. I suppose the validators could be backwards compatible, where every new version adds a case to a switch statement that looks up a say a schema_version
attribute of the document. Then if a version 2 document arrives at a server where the version 3 validator is the gatekeeper, it'll be allowed in. If a version 3 document arrives at a version 2 validator, it's a bit more tricky, it presumably depends on whether strictness or leniency is an appropriate default for the application. But can either of those things even happen, or do the replication rules ensure that even if servers are going up and down, updates and deletes are being done all over the place, and replication connections are intermittent and indirect, that a document will never arrive on a given server before its appropriate validation function, and that a validation function will never arrive too late to handle one of the documents it was supposed to check?
I could well be overcomplicating this or missing out on some Zen insight, but painful experience has taught me that I'm not clever enough to predict what sorts of states concurrent systems can get themselves into.
EDIT:
As Marcello says in a comment, updates on individual servers have sequence numbers, and replication applies the updates in sequence number order. I had a vague idea that that was the case, but I'm still fuzzy on the details. I'm trying to find the simplest possible model that will give me an idea about what can and can't happen in a complex CouchDB system.
Suppose I take the state of server A that's started off empty and has three document writes made to it. So its state can be represented as the following string: A1,A2,A3
Suppose server B also has three writes: B1,B2,B3
We replicate A to B, so the state of B is now: B1,B2,B3,A1,A2,A3
. Although presumably the A updates have taken a sequence number on entering B, so the state is now: B1, B2, B3, B4(A1), B5(A2), B6(A3)
.
If I understand correctly, the replicator also makes a record of the fact that everything up to A3 has been replicated to B, and it happens to store this record as part of B's internal state, but I'm wondering if this is an implementation detail that can be disregarded in the simple model.
If you operate those sets of rules, the A updates and the B updates would stay in order on any server they were replicated to. Perhaps 开发者_如何学Gothe only way they could get out of order is if you did something like replicating A to B, deleting A1 on A and A2 on B, replicating A to C, then replicating B to C, leaving a state on C of: A2, A3, B1, B2, B3, B4(A1)
.
Is this making any sense at all? Maybe strings aren't the right way of visualising it, maybe it's better to think of, I don't know, a bunch of queues (servers) in an airport , airport staff (replicators) moving people from queue to queue according to certain rules , and put yourself into the mind of someone trying to skip the queue, ie somehow get into a queue before someone who's ahead of them in their current queue. That has the advantage of personalising the model, but we probably don't want to replicate people in airports.
Or maybe there's some way of expliaining it as a Towers of Hanoi type game, although with FIFO queues instead of LIFO stacks.
It's a model I'm hoping to find - absolutely precise as far as behavior is concerned, all irrelevant implementation details stripped away, and using whatever metaphor or imagery makes it easiest to intuit.
The basic use case is simple. CouchDB uses sequence numbers to index database changes and to ask what changes need to be replicated. Order is implicit in this algorithm and what you fear should not happen. As a side note, the replication process only copies the last revision of a document, but this does not change anything about order.
精彩评论