Sample a large CouchDB database for local development, avoiding long view builds
CouchDB is convenient to develo开发者_运维知识库p (CouchApps) locally and then push into remote production. Unfortunately with production-sized data sets, working on views can be cumbersome.
What are good ways to take samples of a CouchDB database for use in local development?
The answer is filtered replication. I like to do this in two parts:
- Replicate the production database,
example_db
to my local server asexample_db_full
- Perform filtered replication from
example_db_full
toexample_db
, where the filter cuts out enough data so builds are fast, but keeps enough data so I can confirm my code works.
Which documents to select can be application-specific. At this time, I am satisfied with a simple random pass/fail with a percentage that I can specify. The randomness is consistent (i.e., the same document always passes or always fails.)
My technique is to normalize the content checksum in the document _rev
field on a range of [0.0, 1.0). Then I simply specify some fraction (e.g. 0.01
), and if the normalized checksum value is <= my fraction, the document passes.
function(doc, req) {
if(/^_design\//.test(doc._id))
return true;
if(!req.query.p)
throw {error: "Must supply a 'p' parameter with the fraction"
+ " of documents to pass [0.0-1.0]"};
var p = parseFloat(req.query.p);
if(!(p >= 0.0 && p <= 1.0)) // Also catches NaN
throw {error: "Must supply a 'p' parameter with the fraction of documents"
+ " to pass [0.0-1.0]"};
// Consider the first 8 characters of the doc checksum (for now, taken
// from _rev) as a real number on the range [0.0, 1.0), i.e.
// ["00000000", "ffffffff").
var ONE = 4294967295; // parseInt("ffffffff", 16);
var doc_val = parseInt(doc._rev.match(/^\d+-([0-9a-f]{8})/)[1], 16);
return doc_val <= (ONE * p);
}
精彩评论