Compressed SortedSet<Long> implementation
I need to store a large number of Long
values in a SortedSet
implementation in a space-efficient manner. I was considering bit-set implem开发者_如何学运维entations and discovered Javaewah. However, the API expects int
values rather than long
s.
Can anyone recommend any alternatives or suggest a good way to solve this problem? I am mainly concerned with space efficiency. Upon building the set I will need to access the minimum and maximum element once. However, access time is not a huge concern (i.e. so a fully run-length encoded implementation will be fine).
EDIT
I should be clear that the implementation does not have to implement the SortedSet
interface providing I can access the minimum and maximum elements of the collection.
You could use TLongArrayList which uses a long[]
underneath. It supports sort()
so the min and max will be the first and last value.
Or you could use a long[]
with a length and do this yourself. ;)
This will use about 64 byte more than the raw values themselves. You can get more compact if you can make some assumptions about the range of long values. e.g. if they are actually limited to 48-bit.
You might consider using LongBuffer. If it is memory mapped it avoids using heap or direct memory, but you would have implement a sort routine yourself.
If they are clustered, you might be able to represent the data as a Set of ranges. The ranges could be a pure A - B, or a BitSet with a starting value. The later works well for phone numbers. ;)
Not sure if it has Set or how efficient it is compared to regular JCF, but take a look at this:
http://commons.apache.org/primitives/
精彩评论