开发者

Testing for Endianness: Why does the following code work?

While I do understand endianness, I am slightly unclear on how the code works below. I guess this question is less about endianness and more about how the char * pointer and int work i.e. type开发者_如何转开发 conversion. Also, would it have made any difference if the variable word was not a short but just an int? Thanks!

#define BIG_ENDIAN 0
#define LITTLE_ENDIAN 1

int byteOrder() {
    short int word = 0x0001;
    char * byte = (char *) &word;
    return (byte[0] ? LITTLE_ENDIAN : BIG_ENDIAN);
}


A short int is made up of two bytes, in this case 0x00 and 0x01. On a little endian system, the small byte comes first, so in memory it appears as 0x01 followed by 0x00. Big endian systems are, naturally, reversed. This is what the pointers look like for short integers on a little endian system:

----------------------- ----------------------- 
|   0x01   |   0x00   | |          |          | 
----------------------- ----------------------- 
   &word                  &word+1

Char pointers, on the other hand, are always incremented sequentially. Thus, by taking the address of the first byte of the integer and casting it to a char * pointer, you may increment through each byte of the integer in memory-order. Here's the corresponding diagram:

------------ ------------ ------------ ------------ 
|   0x01   | |   0x00   | |          | |          | 
------------ ------------ ------------ ------------ 
   &byte       &byte+1      &byte+2      &byte+3


(char *)&word points to the first (lowest address) char (byte) of word. If your system is little-endian, this will correspond to 0x01; if it is big-endian, this will correspond to 0x00.

And yes, this test should work whether word is short, int or long (so long as they're bigger in size than a char).


That is a cute little program. You have a word being set to a hex literal 1. If you have little endian, the least significant byte(0x01 in this case) would be at byte[0] when you cast the pointer to a char pointer. and so if 0x01 is at offset 0, then you know it was little endian, otherwise if 0x00 is at offset 0, you know the least significatn byte was stored in the higher memory location(offset 1).

Note: pointers always point to the lowest memory address of the word/data structure etc...


It tells you the endianness of a short. At least on some machines, where short is exactly two bytes. It doesn't necessarily tell you the endianness of an int or a long, and of course, when the integral type is larger than two bytes, the choice isn't binary.

The real question is why you would want to know. It's almost always simpler and more robust to write the code so that it doesn't matter. (There are exceptions, but they almost always involve very low level code which will only work on one specific hardware anyway. And if you know the hardware well enough to be writing that sort of code, you know the endianness.)


The trick I use to remember the byte order when thinking about big-endian vs little-endian is "the names should be the other way around":

  • When you're writing a number by hand, the natural way to do it is to write left-to-right, starting with the most significant digits and ending with the least significant digits. In your example, you'd first write the most significant byte (i.e. 0) then the least significant byte (i.e. 1). This is how big-endian works. When it writes data to memory (with increasing byte address) it ends with the least-significant bytes - the 'little' bytes. So, big-endian actually ends with little bytes.

  • Same for little-endian: it actually ends with the most-significant byte, i.e. the 'big' bytes.

Your source code checks if the 1st byte (i.e. byte[0]) is the most-significant byte (0), in which case it's a 'big-startian', or little endian byte ordering.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜