HashSet of ByteBuffer(actually integers) to separate unique & non unique elements from a ByteBuffer array
I have an array of ByteBuffer
s(which actually represent integers). I want to the separate unique & non unique ByteBuffers (i.e integers) in the array. Thus I am using HashSet of this type:
HashSet<ByteBuffer> columnsSet = new HashSet<ByteBuffer>()
Just wanted to know if HashSe开发者_StackOverflow社区t
is a good way to do so? And do I pay more costs, when doing so for a ByteBuffer
then if I would have done it for a Integer
?
(Actually I am reading serialized data from DB which needs to be written back after this operation thus I want to avoid serialization and deserialization between bytebuffer to Integer and back!)
Your thoughts upon this appreciated.
Creating a ByteBuffer is far more expensive than reading/writing from a reused ByteBuffer.
The most efficient way to store integers is to use int
type. If you want a Set of these you can use TIntHashSet which uses int primitives. You can do multiple read/deserialize/store and reverse with O(1) pre-allocated objects.
First of all, it will work. The overhead of equals()
on two ByteBuffer
s will definitely be higher, but perhaps not enough to offset the benefits of not having to deserialize (though, I'm not entirely sure if that would be such a big problem).
I'm pretty sure that the performance will asymptotically be the same, but a more memory-efficient solution is to sort your array, then step through it linearly and test successive elements for equality.
An example, suppose your buffers contain the following:
1 2 5 1
Sort it:
1 1 2 5
Once you start iterating, you get ar[0].equals(ar[1])
and you know these are duplicates. Just keep going like that till n-1
.
Collections normally operate on the equals()
and hashCode()
methods, so performance implications would come through the implementation of the objects stored in the collection.
Looking at ByteBuffer
and Integer
one can see that the implementation of those methods in Integer
are simpler (just one int comparison for equals()
and return value;
for hashCode()
). Thus you could say the Set<ByteBuffer>
has higher cost than a Set<Integer>
.
However, I can't tell you right now if this cost is higher than the serialization and deserialization cost.
In fact, I'd just go for the more readable code unless you really have a performance problem. In that case I'd just try both methods and take the faster one.
精彩评论