Why ArrayList grows at a rate of 1.5, but for Hashmap it's 2?
As per Sun Java Implementation, during expansion, ArrayList grows to 3/2 it's initial capacity whereas for HashMap the expansion rate is double. What is reason behind thi开发者_Go百科s?
As per the implementation, for HashMap, the capacity should always be in the power of two. That may be a reason for HashMap's behavior. But in that case the question is, for HashMap why the capacity should always be in power of two?
The expensive part at increasing the capacity of an ArrayList is copying the content of the backing array a new (larger) one.
For the HashMap, it is creating a new backing array and putting all map entries in the new array. And, the higher the capacity, the lower the risk of collisions. This is more expensive and explains, why the expansion factor is higher. The reason for 1.5 vs. 2.0? I consider this as "best practise" or "good tradeoff".
for HashMap why the capacity should always be in power of two?
I can think of two reasons.
You can quickly determine the bucket a hashcode goes in to. You only need a bitwise AND and no expensive modulo.
int bucket = hashcode & (size-1);
Let's say we have a grow factor of 1.7. If we start with a size 11, the next size would be 18, then 31. No problem. Right? But the hashcodes of Strings in Java, are calculated with a prime factor of 31. The bucket a string goes into,
hashcode%31
, is then determined only by the last character of the String. Bye byeO(1)
if you store folders that all end in/
. If you use a size of, for example,3^n
, the distribution will not get worse if you increasen
. Going from size3
to9
, every element in bucket2
, will now go to bucket2
,5
or7
, depending on the higher digit. It's like splitting each bucket in three pieces. So a size of integer growth factor would be preferred. (Off course this all depends on how you calculate hashcodes, but a arbitrary growth factor doesn't feel 'stable'.)
The way HashMap is designed/implemented its underlying number of buckets must be a power of 2 (even if you give it a different size, it makes it a power of 2), thus it grows by a factor of two each time. An ArrayList can be any size and it can be more conservative in how it grows.
The accepted answer is not actually giving exact response to the question, but comment from @user837703 to that answer is clearly explaining why HashMap grows by power of two.
I found this article, which explains it in detail http://coding-geek.com/how-does-a-hashmap-work-in-java/
Let me post fragment of it, which gives detailed answer to the question:
// the function that returns the index of the bucket from the rehashed hash
static int indexFor(int h, int length) {
return h & (length-1);
}
In order to work efficiently, the size of the inner array needs to be a power of 2, let’s see why.
Imagine the array size is 17, the mask value is going to be 16 (size -1). The binary representation of 16 is 0…010000, so for any hash value H the index generated with the bitwise formula “H AND 16” is going to be either 16 or 0. This means that the array of size 17 will only be used for 2 buckets: the one at index 0 and the one at index 16, not very efficient…
But, if you now take a size that is a power of 2 like 16, the bitwise index formula is “H AND 15”. The binary representation of 15 is 0…001111 so the index formula can output values from 0 to 15 and the array of size 16 is fully used. For example:
- if H = 952 , its binary representation is 0..01110111000, the associated index is 0…01000 = 8
- if H = 1576 its binary representation is 0..011000101000, the associated index is 0…01000 = 8
- if H = 12356146, its binary representation is 0..0101111001000101000110010, the associated index is 0…00010 = 2
- if H = 59843, its binary representation is 0..01110100111000011, the associated index is 0…00011 = 3
This is why the array size is a power of two. This mechanism is transparent for the developer: if he chooses a HashMap with a size of 37, the Map will automatically choose the next power of 2 after 37 (64) for the size of its inner array.
Hashing takes advantage of distributing data evenly into buckets. The algorithm tries to prevent multiple entries in the buckets ("hash collisions"), as they will decrease performance.
Now when the capacity of a HashMap is reached, size is extended and existing data is re-distributed with the new buckets. If the size-increas would be too small, this re-allocation of space and re-dsitribution would happen too often.
A general rule to avoid collisions on Maps is to keep to load factor max at around 0.75 To decrease possibility of collisions and avoid expensive copying process HashMap grows at a larger rate.
Also as @Peter says, it must be a power of 2.
I can't give you a reason why this is so (you'd have to ask Sun developers), but to see how this happens take a look at source:
HashMap: Take a look at how HashMap resizes to new size (source line 799)
resize(2 * table.length);
ArrayList: source, line 183:
int newCapacity = (oldCapacity * 3)/2 + 1;
Update: I mistakenly linked to sources of Apache Harmony JDK - changed it to Sun's JDK.
精彩评论