开发者

How should application perform in 64-bit vs. 32-bit intel architectures?

I want to know the relative performances of a normal C++ application in the following scenarios:

  1. Built as 32-bit app, run on Intel 64-bit processor (x64-64)
  2. Built as 32-bit app, run on Intel 32-bit processor (x86)
  3. Built开发者_StackOverflow社区 as 64-bit app.

Also, what factors should I consider when modifying / developing the application to make it to run faster on 64-bit processors?


Short answer: you probably won't notice much of a difference.

Longer answer: 64-bit x86 has more general purpose registers, which gives the compiler more of an opportunity to optimize local variables into registers for faster access. the compiler can also assume more modern features, eg. not having to optimize code for a 386, and can assume your CPU has stuff like SSE instead of the old x87 FPU for floating point math. but pointers will be twice as wide, which is worse for the cache.


CPU-intensive programs might be noticeably faster on 64-bit. The processor has 16 instead of 8 general purpose registers available which are also twice as wide (64 instead of 32 bits).

Also the number of registers for SSE instructions is doubled from 8 to 16 which helps for multimedia-applications or other applications which do a lot of floating-point computations.

For details see x86-64 on Wikipedia.

One thing that has not been mentioned yet is that 64-bit versions of operating systems such as Windows and Linux use a different calling convention for function calls on 64-bit systems; instead of passing arguments on the stack, arguments are (preferrably) passed in registers, which is in principle faster. So software will be faster because there is less function call overhead.


The performance will very likely depend on your application, and can vary a lot, depending on whether or not you use libraries that have optimizations for 64-bit environments. If you want to count on speed up, you should focus on improving your algorithms, rather than considering the instruction set architecture.

As for preparing/developing for 64-bit... the key thing is to not make assumptions with regard to types and their respective sizes. If you need a type with a specific size, use the types defined in <stdint.h>. Whenever you see functions that use size_t or ptrdiff_t, you should use the typedefs rather than some other type.


In general, you won't find equivalent processors that differ only in their support for 64-bit operation, so it'll be hard to give any concrete comparisons between 1) and 2). On the other hand, the difference between building for 32 and 64 bit mode is entirely dependent on the application. A 64-bit version might be slightly slower or slightly faster than the 32-bit version. If your application uses a lot of temporary variables, then the increased register set of 64-bit mode can make a very large difference in performance.


From experience I've tended to find a 64-bit re-compile of a 32-bit application generally makes things about 30% faster. Its a rough figure but it holds for quite a number of applications i've ported to 64-bit. Basically its for the reasons explained above. You have more registers which is a godsend and allows for much less swapping in and out of memory (which will probably be cached anyway making the win quite small). Certain optimisations can be made much more easily as well. HOWEVER, you do suffer the problem of larger pointers that does wipe out some of the gain, not to mention that doing a context switch requires more memory to be used due to the larger register set.

Careful hand optimisation in 64-bit can provide HUGE performance wins, however.

Your best plan is to recompile as 64-bit and profile. ie See which is better.


Do you have any requirement for > 4G of memory? Exploiting gobs of memory is really the big reason to go 64-bit.


do you guys know anything about multi-channels MC concurrent data bus burst, IMC, and multi-core features of new x86_64 architectures? at least, memcpy can be optimized faster if 64 bits because of using 64 bits bus and registers regardless of concurrent burst. at least new archs are able to prefetch data from multiple memory modules into cache concurrently. and more...

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜