jvm on multicore
I've read a blog post a while ago claiming a Java application ran better when it was allowed to utilize a single cpu in a multicore开发者_如何学C machine: http://mailinator.blogspot.com/2010/02/how-i-sped-up-my-server-by-factor-of-6.html
What reasons could there be for a Java application, running on multicore machines to run much slower than on a single core machine?
If there is significant contention among shared resources in the different threads, it could be that locking and unlocking objects requires a large amount of IPI (inter-processor interrupts) and the processors may spend more time discarding their L1 and L2 caches and re-fetching data from other CPUs than they actually spend making progress on solving the problem at hand.
This can be a problem if the application has way too-fine-grained locking. (I once heard it summed up "there is no point having more than one lock per CPU cache line", which is definitely true, and perhaps still too fine-grained.)
Java's "every object is a mutex" could lead to having too many locks in the running system if too many are live and contended.
I have no doubt someone could intentionally write such an application, but it probably isn't very common. Most developers would write their applications to reduce resource contention where they can.
I doubt the "Much" part.
My guess would be that the expense of moving state from one cpu to another is high enough to be noticeable. Generally you want jobs to stay on the same cpu so its data is cached as much as possible locally.
This is entirely speculation without the article/data in question, but there are some types of programs which are not well suited for parallelization - perhaps the application is never CPU-bound (meaning the CPU is not the bottleneck, perhaps some sort of I/O is).
However this question/conversation is pretty baseless without more details.
There is no Java-specific reason for this, but moving state from core to core or even from CPU to CPU takes time. This time can be used better if the process stays on a single core. Also, caching can be improved in such cases.
This is only relevant though if the program does not utilize multiple threads and can thus distribute its work on to multiple cores/CPUs effectively.
The application could make very poor use of blocking inter-thread communication. However, this would purely be down to the fact that the application is programmed exceptionally poorly.
There is no reason at all why any even mediocre-ly programmed multi-core application with a moderately parallelisable workload should run slower on multiple cores.
From a pure performance perspective, the challenge is often around the memory subsystem. So while more CPUs is often good, having CPUs that aren't near the memory that the Java objects are sitting in is very, very expensive. It is VERY machine specific, and depends greatly on the exact path between each CPU and memory. Both Intel and AMD have had various shapes / speeds here, and the results vary greatly.
See NUMA for reasons why multi-core might hinder.
We have seen performance deltas in the 30% range or more depending on how JVMs are pinned to processors. SPECjbb2005 is now mostly run in "multi-JVM" mode with each JVM associated with a given CPU / memory for this reason.
The JIT will not include memory barriers if it thinks its running in a single core. I suspect that is what is happening in the referenced article.
Here is a very concise explanation of memory barriers, it also provides a neat technique of seeing the JIT'd code: http://www.infoq.com/articles/memory_barriers_jvm_concurrency
This isn't to say all applications would benefit from being placed on a single core.
Recent Intel CPUs have Turbo Boost:
http://en.wikipedia.org/wiki/Intel_Turbo_Boost
This will be depend on the number of threads the application spawns. If you spawn say four worker-threads doing heavy number-crunching, the app will be almost four times faster on a quad-core machine, depending on how much book-keeping and merging you must do.
CPU often have a limit to how much heat they can produce. This means a chip with less core can run at a high frequency which can result in a program running faster if it doesn't use the extra core effectively. Today the difference is between 4, 6 and 8 core, where more cores are individually slower. I don't know of any single core systems which are faster than the fastest 4 core system.
精彩评论