开发者

CouchDB Compaction and Doc Deletion - Compaction indifferent?

Taking a simple CouchDB to a theory that CouchDB compaction is totally indifferent to deleted docs.

Deleting a doc from couch via a DELETE method yields the following when trying to retrieve it:

localhost:5984/enq/deleted-doc-id {"error":"not_found","reason":"deleted"}

Expected.

Now I compact the database: localhost:5984/enq/_compact {'ok': true }

And check compaction has finished "compact_running":false

Now I would expect CouchDB to return not_found, reason "missing" on a simple GET localhost:5984/enq/deleted-doc-id {"error":"not_found","reason":"deleted"}

And trying with ?rev=deleted_rev gives me a ful doc, yeah for worthless data.

So am I correct in thinking the couchdb compaction shows no special treatment for del开发者_开发技巧eted docs and simple looks at the rev count again rev limit when deciding what is part of compaction. Is there a special rev_limit we can set for deleted docs?

Surely the only solution can't be a _purge? at the moment we must have thousands of orphaned deleted docs, and whilst we want to maintain some version history for normal docs we dont want to reduce our rev_limit to 1 to assist in this scenario

What are the replication issues we should be aware of with purge?


Deleted documents are preserved forever (because it's essential to providing eventual consistency between replicas). So, the behaviour you described is intentional.

To delete a document as efficiently as possible use the DELETE verb, since this stores only _id, _rev and the deleted flag. You can, of course, achieve the same more manually via POST or PUT.

Finally, _purge exists only for extreme cases where, for example, you've put an important password into a couchdb document and need it be gone from disk. It is not a recommended method for pruning a database, it will typically invalidate any views you have (forcing a full rebuild) and messes with replication too.


Adding a document, deleting it, and then compacting does not return the CouchDB database to a pristine state. A deleted document is retained through compaction, though in the usual case the resulting document is small (just the _id, _rev and _deleted=true). The reason for this is replication. Imagine the following:

  • Create document.
  • Replicate DB to remote DB.
  • Delete document.
  • Compact DB.
  • Replicate DB to remote DB again.

If the document is totally removed after deletion+compaction, then the second replication won't know to tell the remote DB that the document has been deleted. This would result in the two DBs being inconsistent.

There was an issue reported that could result in the document in the DB not being small; however it did not pertain to the HTTP DELETE method AFAIK (though I could be wrong). The ticket is here:

https://issues.apache.org/jira/browse/COUCHDB-1141

The basic idea is that audit information can be included with the DELETE that will be kept through compaction. Make sure you aren't posting the full doc body with the DELETE method (doing so might explain why the document isn't actually removed).


To clarify... from our experience you have to kick of a DELETE with the id and a compact in order to fully remove the document data.

As pointed out above you will still have the "header data" in your database afterwards.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜