HBase column wide scanning and fetching
Let's say i've created 开发者_运维百科a table
rowkey (attrId+attr_value) //compound key
column => doc:doc1, doc:doc2, ...
when use scan feature, i would fetch 1 row every time inside iterator, what if the column qualifier reach millions entries. how do you loop through that, and will there be a cache issue?
thanks.
Scans fetch rows. You can qualify a scan so that it only fetches given qualifiers or given families, but then that is all that will be returned from the scan (and you can only filter on data that is included in a scan).
If you have potentially millions of columns in a single row, that could be an issue: that means that returning that row could be a very large network transfer. If your row size exceeds your region size it could also cause OOM errors on your region servers, and you will have inefficient storage (one row per region).
However, ignoring all of that, you can loop through the columns and column qualifiers in the client.You can get a Map from the result set that maps from families to qualifiers to values. But that is probably not what you really want to do
You can workaround giant row fetches with a mixture of scans and column filters:
Scan s = ...;
s.setStartRow("some-row-key");
s.setStopRow("some-row-key");
Filter f = new ColumnRangeFilter(Bytes.toBytes("doc0000"), true,
Bytes.toBytes("doc0100"), false);
s.setFilter(f);
Source: http://hadoop-hbase.blogspot.com/2012/01/hbase-intra-row-scanning.html
You can also limit the number of columns within a row returned at a time via Scan.setBatch.
精彩评论