开发者

Java ConcurrentHashMap is better than HashMap performance wise?

I was just reading the book Clean Code and came across this statement:

When Java was young Doug Lea wrote the seminal book[8] Concurrent Programming in Java. Along with the book he developed several thread-safe collection, which later became part of the JDK in the java.util.concurrent p开发者_运维技巧ackage. The collections in that package are safe for multithreaded situations and they perform well. In fact, the ConcurrentHashMap implementation performs better than HashMap in nearly all situations. It also allows for simultaneous concurrent reads and writes, and it has methods supporting common composite operations that are otherwise not thread safe. If Java 5 is the deployment environment, start with ConcurrentHashMap

Note that in the above quote I used "[n]", where n is some number, to indicate the places where the author provided references, and as you can see he did not provide any reference for the bold part.

Not that I don't believe this statement, but I would love to know the supporting evidences of this statement. So, does anyone know any resources that shows the performance statistics for both ConcurrentHashMap and HashMap? Or can anyone explain to me why ConcurrentHashMap is faster than HashMap?

I probably will look into ConcurrentHashMap's implementation at work when I'm taking a break, but for now I would like to hear the answers from fellow SOers.


Doug Lea is extremely good at these things, so I won't be surprised if at one time his ConcurrentHashMap performs better than Joshua Bloch's HashMap. However as of Java 7, the first @author of HashMap has become Doug Lea too. Obviously now there's no reason HashMap would be any slower than its concurrent cousin.

Out of curiosity, I did some benchmark anyway. I run it under Java 7. The more entries there are, the closer the performance is. Eventually ConcurrentHashMap is within 3% of HashMap, which is quite remarkable. The bottleneck is really memory access, as the saying goes, "memory is the new disk (and disk is the new tape)". If the entries are in the cache, both will be fast; if the entries don't fit in cache, both will be slow. In real applications, a map doesn't have to be big to compete with others for residing in cache. If a map is used often, it's cached; if not, it's not cached, and that is the real determining factor, not the implementations (given both are implemented by the same expert)

public static void main(String[] args)
{
    for(int i = 0; i<100; i++)
    {
        System.out.println();

        int entries = i*100*1000;
        long t0=test( entries, new FakeMap() );
        long t1=test( entries, new HashMap() );
        long t2=test( entries, new ConcurrentHashMap() );

        long diff = (t2-t1)*100/(t1-t0);
        System.out.printf("entries=%,d time diff= %d%% %n", entries, diff);
    }
}


static long test(int ENTRIES, Map map)
{
    long SEED = 0;
    Random random = new Random(SEED);

    int RW_RATIO = 10;

    long t0 = System.nanoTime();

    for(int i=0; i<ENTRIES; i++)
        map.put( random.nextInt(), random.nextInt() );

    for(int i=0; i<RW_RATIO; i++)
    {
        random.setSeed(SEED);
        for(int j=0; j<ENTRIES; j++)
        {
            map.get( random.nextInt() );
            random.nextInt();
        }
    }
    long t = System.nanoTime()-t0;
    System.out.printf("%,d ns %s %n", t, map.getClass());
    return t;
}


static class FakeMap implements Map
{
    public Object get(Object key)
    {
        return null;  
    }
    public Object put(Object key, Object value)
    {
        return null;  
    }
    // etc. etc.
}


If you are accessing the HashMap with only a single thread HashMap is fastest (it does not do any synchronization), if you are accessing it from multiple threads ConcurrentHashMap is faster than doing the synchronization coarse-grained by hand. See here for a little comparison:

http://www.codercorp.com/blog/java/why-concurrenthashmap-is-better-than-hashtable-and-just-as-good-hashmap.html


The reason a HashMap could be slower is because it has to detect ConcurrentModification to know when to throw an exception. ConcurrentHashMap does not have to check modCount to know when to throw (but it does use it for size() and isEmpty()). Acquiring a lock is very fast, especially so in single-threaded situations when you already hold the lock, but checking modCount is two reads and a jump-if-not-equal that HashMap must pay to throw CoModException.

I recommend reading the source of your collections classes, so you know how much work they are doing when you make a method call. In situations when you have a completely private map for dictionary get/put only, you can often used a stripped down HashMap without any modCount or even size tracking for added performance boost.


This is the kind of rubbery statement that is hard to prove one way or the other. How do you measure something in "nearly all situations"?

A ConcurrentHashMap is likely to be better than a synchronized HashMap. The more contention there is, the more significant the difference will be. On the other hand an unsynchronized HashMap is likely to be faster than a ConcurrentHashMap, because of the overhead of unnecessary locking in the latter case.


I'd also like to see the context for that statement, and what evidence the book's author puts forward to support it. And the evidence for the unstated assumption that "nearly all" use-cases for hash maps involve synchronization.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜