开发者

Traversing BerkleyDB database in store order

When using c开发者_StackOverflowursors in BerkleyDB JE I found that traversing a dataset generate a lot of random read IO. It happens because BDB traverse dataset in primary key ascending order.

In my application I have not any requirements to process dataset in order (mathematically speaking, my operation is commutative) and I interested in maximizing throughput.

Is there any way to process dataset with cursor in store order and not in primary key order.


I would guess not ; BDBJE is a log-structured database - ie, all writes are appended to the end of a log. This means that records are always appended to the last log, and may supercede records in previous logs. Because BDBJE cannot by design write to old logs, it cannot mark old records as superceded, so you cannot walk forward through the storage processing records because you are unaware of whether the record is current without having processed records from later in the log.

BDBJE will clean old logs as their "live" record count diminishes by copying the live records forward into new logs and deleting the old files, which shuffles the ordering yet more.

I found the Java binding of Kyoto Cabinet to be faster than BDB for raw insert performance, and you have a choice of storage formats, which may allow you to optimize your cursor-ordered record traverse performance. The license is similar (Kyoto Cabinet is GPL3, BDB is the Oracle BDB License (copyleft)) unless you pay for a commercial license in either case.

Update : As of version 5.0.34, BDBJE includes the DiskOrderedCursor class which addresses the required use case - it traverses records in log sequence, which in an unfragmented log file should be the same as disk order.


There are new "bulk-access" interfaces available that allow one to read multiple presumably contiguous records into a buffer using using either of the Db#get() or Dbc#get() methods in concert with the DB_MULTIPLE flag.

That documentation is for version 4.2.52, and I had some trouble finding documentation for the com.sleepycat.db package on Oracle's site. Here I found the documentation for version 4.8.30, but the classes Db and Dbc are not mentioned there.

Ah, classes MultipleEntry and MultipleDataEntry look to be promising equivalents to the use of DB_MULTIPLE above. The idea is that when you fetch data using, say, MultipleDataEntry with a suitably-sized buffer, you'll get back a whole bunch of records together that can then be picked apart using MultipleDataEntry#next().

I get the impression that this part of the interface has been in flux. As I don't have a fresh enough version of the library available on my project, I can't claim to have used these bulk-fetching interfaces yet. Please report back if you're able to investigate their use.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜