Speed of operations on misaligned data

2023-04-05 16:01 问答作者：

To my knowledge, a CPU performs best with a datum that is aligned on the boundary equal to the size of that datum. For example, if every int datum is 4 bytes in size, then the address of every int must be a multiple of 4 to make the CPU happy; same with 2-byte short data and 8-byte double data. For this reason, new operator and malloc function alway开发者_如何转开发s return an address that is a multiple of 8 and, therefore, is a multiple of 4 and 2.

In my program, some time-critical algorithms that are meant to process large byte arrays allow striding through the computation by converting each contiguous 4 bytes into an unsigned int and, in this way, do the arithmetic much faster. However, the address of the byte array is not guaranteed to be a multiple of 4 because only a part of a byte array may need to be processed.

As far as I know, Intel CPUs operate on misaligned data properly but at the expense of speed. If operating on misaligned data is slower enough, the algorithms in my program would need to be redesigned. In this connection I've got two questions, the first of which is supported with the following code:

// the address of array0 is a multiple of 4:
unsigned char* array0 = new unsigned char[4];
array0[0] = 0x00;
array0[1] = 0x11;
array0[2] = 0x22;
array0[3] = 0x33;
// the address of array1 is a multiple of 4 too:
unsigned char* array1 = new unsigned char[5];
array1[0] = 0x00;
array1[1] = 0x00;
array1[2] = 0x11;
array1[3] = 0x22;
array1[4] = 0x33;
// OP1: the address of the 1st operand is a multiple of 4,
// which is optimal for an unsigned int:
unsigned int anUInt0 = *((unsigned int*)array0) + 1234;
// OP2: the address of the 1st operand is not a multiple of 4:
unsigned int anUInt1 = *((unsigned int*)(array1 + 1)) + 1234;

So the questions are:

How much slower is OP2 compared to OP1 on x86, x86-64, and Itanium processors (if neglect the cost of type casting and address increment)?
When writing cross-platform portable code, about what kinds of processors should I be concerned regarding misaligned data access? (I already know about RISC ones)

There are far too many processors on the market to be able to give a generic answer. The only thing that can be stated with certainty is that some processors cannot do an unaligned access at all; this may or may not matter to you if your program is intended to run in a homogeneous environment, e.g. Windows.

In a modern high-speed processor the speed of unaligned accesses may be more impacted by its cache alignment than its address alignment. On today's x86 processors the cache line size is 64 bytes.

There's a Wikipedia article that might provide some general guidance: http://en.wikipedia.org/wiki/Data_structure_alignment

继续阅读：memory-alignment

Speed of operations on misaligned data

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？