HashSet of ByteBuffer(actually integers) to separate unique & non unique elements from a ByteBuffer array

2023-02-18 04:05 问答作者：

I have an array of ByteBuffers(which actually represent integers). I want to the separate unique & non unique ByteBuffers (i.e integers) in the array. Thus I am using HashSet of this type:

HashSet<ByteBuffer> columnsSet = new HashSet<ByteBuffer>()

Just wanted to know if HashSe开发者_StackOverflow社区t is a good way to do so? And do I pay more costs, when doing so for a ByteBuffer then if I would have done it for a Integer?

(Actually I am reading serialized data from DB which needs to be written back after this operation thus I want to avoid serialization and deserialization between bytebuffer to Integer and back!)

Your thoughts upon this appreciated.

Creating a ByteBuffer is far more expensive than reading/writing from a reused ByteBuffer.

The most efficient way to store integers is to use int type. If you want a Set of these you can use TIntHashSet which uses int primitives. You can do multiple read/deserialize/store and reverse with O(1) pre-allocated objects.

First of all, it will work. The overhead of equals() on two ByteBuffers will definitely be higher, but perhaps not enough to offset the benefits of not having to deserialize (though, I'm not entirely sure if that would be such a big problem).

I'm pretty sure that the performance will asymptotically be the same, but a more memory-efficient solution is to sort your array, then step through it linearly and test successive elements for equality.

An example, suppose your buffers contain the following:

1 2 5 1

Sort it:

1 1 2 5

Once you start iterating, you get ar[0].equals(ar[1]) and you know these are duplicates. Just keep going like that till n-1.

Collections normally operate on the equals() and hashCode() methods, so performance implications would come through the implementation of the objects stored in the collection.

Looking at ByteBuffer and Integer one can see that the implementation of those methods in Integer are simpler (just one int comparison for equals() and return value; for hashCode()). Thus you could say the Set<ByteBuffer> has higher cost than a Set<Integer>.

However, I can't tell you right now if this cost is higher than the serialization and deserialization cost.

In fact, I'd just go for the more readable code unless you really have a performance problem. In that case I'd just try both methods and take the faster one.

继续阅读：arrays bytebuffer hashset

HashSet of ByteBuffer(actually integers) to separate unique & non unique elements from a ByteBuffer array

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？