What load factor should be used when you know maximum possible no of elements in HashSet
What load factor should I use when I really know the maximum possible no of elements in a HashSet ? I had heard that the default load factor of 0.75 is recommended as it offers good performance trade-offs between speed & space. Is this correct ? However a larger size HashSet would also takes more time in creation and more sp开发者_如何学运维ace.
I am using HashSet just inorder to remove duplicate integers from a list of integers.
I spent some time playing around with load factors once, and it is shocking how little difference that setting really makes in practice. Even setting it to something high like 2.0 doesn't slow things down much, nor does it save that much memory. Just pretend it doesn't exist. Josh has often regretted ever exposing it as an option at all.
For your stated problem, instead of using a HashSet, you might also consider a BitSet
Depending on the range and sparsity of your integers, you may get better performance and space characteristics.
It depends on your integers a lot. The point of the load factor is to "balance" the hash function: with a "perfect" hash function, your load factor could be 1.0. However, if the integer values in question show any sort of regularity, this may result in more than average hash collisions, which decrease the efficiency of the map. Then a lower load factor may help to spread the values a bit better (over a larger range), thus reduce hash collisions.
I wouldn't worry much about the creation time and extra space taken by using a lower load factor - I doubt you would ever notice the difference (unless you are on a platform with limited hardware, or have several millions of integers in your map - then the size difference may become noticeable, roughly on the range of a few extra megabytes per million values).
If you know EXACTLY how many you should have, you should put the load factor at 1 and ensure your hash function maps 1:1. You may want to extend your container to not rehash your hash.
Note that this sort of 'exact' thing tends to change over time, so you might be best off with just your normal container. :)
EDIT: My answer was before I knew it was integers.
Yeah, your best bet is to just leave as it is. You'll never notice the difference.
/**
* Remove duplicates from a list.
* @note This will ALTER the list.
* @note This is not thread safe.
* @param the list (potentially with duplicates)
*/
void removeDuplicates(List<Integer> list) {
Set<Integer> noDupe = new HashSet<Integer>(list.size()); // will end up resizing once, oh well
for(Integer i : list) noDupe.add(i);
list.clear();
list.addAll(noDupe);
}
精彩评论