Reference needed: Hardware architecture and performance improvement [HPC/Parallel computing]
There are several ways/method to improve the performance of the HPC applications. One of the method is to fine tune the application based on the hardware architecture. This kind of fine tuning is mostly done on multicore architecture. In order to use this method, one should really understand the underlying hardware architecture such as memory, no.of sockets, no.of cores per socket, L1/L2 cache, GFlops, etc...
Even though these technical terms looks familiar, I still don't have a clear understanding of what exactly it means in terms of the performance of the application.
Can anyone suggest a good place/book from where I can understa开发者_C百科nd the hardware architecture in terms of the performance.
It is very important to tune the code to the target hardware architecture. However, unless you have lots of time and resources, this is impossible to do for the wide variety of systems available.
Optimization follows the 80-20 rule. You get 80% benefit with 20% of effort. Beyond that, your returns will start to diminish.
Here is the process I follow: 1) Obtain the best compiler for your target architecture. Sometimes GNU maybe the best compiler for a particular platform, dont be surprised. 2) Read through the "code optimization" section for the compiler. 3) Identify the right flags to generate the best code for the target platform. However, make sure you validate the results of the code with every level of optimization you try. Higher optimization levels will affect the correctness of the code. 4) Make sure any libraries you need are optimized for that system. For ex, math libraries, BLAS libraries etc. 5) Pay special attention to platform specific hardware features, like SSE (SIMD), number of cores or accelerators. YOu may need to modify your code or provide hints to the compiler to optimize the code better for these features.
You will have to do this for every target platform. By this time you should see the maximum benefit with minimal effort.
If you need to extract more performance, it almost always demands you rewrite your code to make sure the hardware features are fully exploited.
No, there are no books for this. The closest is "optimization manuals" generally provided free of cost by the vendor (IBM redbooks, Intel, AMD, Cray).
Ex: support.amd.com/us/Processor_TechDocs/25112.PDF http://www.intel.com/products/processor/manuals/ http://www.ibm.com/developerworks/wikis/download/attachments/137167333/Power6_optimization.pdf?version=1
These are the most valid resources for these platforms. You should aim to find out such resources for your target platform.
精彩评论