开发者

byte swapping and C++/C

I did a post on comp.lang.c++ and got this

http://groups.google.com/group/comp.lang.c++/browse_thread/thread/afc946826945bdb1/90850f6a0e0edd2d#90850f6a0e0edd2d

but that is still not the answer.

I have a little confusion for a binary read operation.

I am trying to read a binary file with the stream functions. This is a result file of a commercial program(ANSYS), and I know the structure of the file, at least from the manual.

The file is structured as records and the program is written in fortran. So the structure is like

record Length (int) dummy integer data (could be int, double) dummy integer

The first record is a 100 integer block, where this corresponds to data in the above representation.

If I start reading the file and read the first value which is the record length (an integer), I have to swap the bytes to get the correct value of 100.

I did not understand why I have to swap the bytes, because this file is generated on the same machine and they should be using the same system specific routines so that should not be a problem, but it does not seem that is the case. There is sth else going on. I could not understand this. Can the software be forcing to swap the bytes which I would have hard time to understand the reason?

Any comments are appreciated.

Here is a naive test case

int main () {
  ifstream myfile;
  char intBuffer[4];
  myfile.open ("truss.rst", i开发者_Go百科os::binary);
  myfile.read(intBuffer, sizeof(int));
  //cout << *((int*)intBuffer) << endl;
  // if I do not use this portion-
  // I do not get what I want
  char *cptr, tmp;
  tmp = intBuffer[0];
  intBuffer[0] = intBuffer[3];
  intBuffer[3] = tmp;
  tmp = intBuffer[1];
  intBuffer[1] = intBuffer[2];
  intBuffer[2] = tmp;
  // -----------------------------
  cout << *((int*)intBuffer) << endl;

  myfile.close();
  return 0;
}

Best, U.


This doesn't depends only on the machine you are working on. If the Fortran infrastructure writes integers in big endian instead of little endian, you'll have to deal with that no matter what the OS is.

I'd suggest you to use ntohl() and ntohs() function, which are clearer than your exchanging routine.


Whatever the format is it will obviously be consistent across machines (it would be kind of funny if you couldn't open a file on another machine).

Therefore both byte ordering and data type sizes have to defined in the format and when you want to read such format, you need to work with these byte orders and data type sizes.


It's not uncommon for software to adopt a specific byte order to make the binary files more portable, even if the software doesn't support other platforms yet, or might never. Similarly, software may use a serialisation library that's designed for portability. Routines like ntohl() may help you restore the order you want.


Maybe the software do this "strange" operation in order to support little/big endian architecture (byte order differ).

Conclusion:

  • On two different machine (little/big endian) if you insert binary information in your file, with a same input, file can be different.


some file formats require the byte order to be in a single way normally big endian as that's network order so on little endian x86s those files have their ints byte swapped when written and swapped back when read


This is the endian problem. Intel CPU use little endian. The "network byte order" / SPARC / Motorola use big endian. Many legacy, portable application save files in big endian for interoperability.


There is some well known times when you voluntarily force one byte order : when the data is intended to be exchanged between machines whose endianness is unknown at start, like through network. That is why there is C primitives like ntohl and htonl : if network endianess is the same as machine endianness these do nothing, otherwise they swap bytes. There could be something similar involved here if the files are supposed to be exchanged between machines.

But the true question is: is there also the same byte swapping in data block. If not there is indeed something strange, and the 0 could just be padding, not at all part of the format. If the byte swapping also occurs in data block, it's probably done on purpose.

The most portable solution is certainly to read file byte by byte and assemble your data by hand, thus you may be able to handle integers of size bigger than uint32_t.

Be also ready to get into some troubles when reading doubles, as the byte ordering is also probably swapped, and they are not as easy to assemble by hand.

The code below should work as a template for any type whose you want to change endianness, including double.

#include <stdio.h>
#include <arpa/inet.h>
#include <stdint.h>

template <class builtin>
builtin ntoh(const builtin input) {
    if ((int)ntohs(1) != 1){
        union {
            char buffer[sizeof(builtin)];
            builtin data;
        } in, out;
        in.data = input;
        for (int i = 0 ; i < sizeof(builtin); i++){
            out.buffer[i] = in.buffer[sizeof(builtin) - i - 1];
        }
        return out.data;
    }
    return input;
}

main(){
    printf ("78563412 expected, got: output= %x\n", ntoh<uint32_t>(0x12345678));
}

It will not provide the best performance, look here to get better performance for native types.


htonl (host to network long) and htons (host to network short) will go from whatever platform you are on to big-endian. That was because back in those days most network hosts ran a form of UNIX that used native big-endian.

ntohl and ntohs will convert big endian to native, regardless of your platform. If your are on a big endian platform, these will be a no-op.

Aside from byte order, the other potential portability issue is the size of the short and the long. ntohl will read 4 bytes and convert to a 32 bit integer. The target int therefore be at least 32 bits to hold it, it doesn't need to be exactly that length. ntohs reads 2 bytes and convert to a short int of 16 bits. Note that if your native platform does use more than 32 bits for long or 16 bits for short, you have to manage the "sign" issue if they are signed integers (because the actual type for ntohl is unsigned).

As more machines now including Linux ones use Intel processors with little-endian notation, a lot more frequently now you use that as the "default" format and get big-endian formats to change. In such a case you may want to write macros of your own to convert to little-endian (on an already little-endian platform they would be no-op).

For actually reversing bytes you can use std::reverse, by the way, and you will need two pointers, one pointing to the first byte and the other one to be one past the last byte.

You can also implement "byte-swap" and then your right pointer should be on the last byte, not one-past-the-end. You byteswap like this:

void byteswap( unsigned char & byte1, unsigned char & byte2 )
{
   byte1 ^= byte2;
   byte2 ^= byte1;
   byte1 ^= byte2;
}

To implement in C (rather than C++) you would use a pointer rather than a reference as the parameter.

In the actual example you have given, the file appears to be stored in 32-bit big-endian (i.e. network) byte order by its specification, so you can here simply use ntohl, however ntohl takes an unsigned int as a parameter. Thus correct your code to:

uint32_t count = 0;
myfile.open ("truss.rst", ios::binary);
myfile.read(reinterpret_cast<char*>(&count), sizeof(uint32_t)); 
   // ideally validate that the read succeeded
count = ntohl( count );

One of the weakenesses in iostream in my opinion that you have to do that cast. Whoever wrote it never really liked the concept of binary i/o. Of course if you are writing this in C rather than C++ you would use FILE* and fread.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜