Simple data serialization in C

2023-03-13 10:37 问答作者：

I am currently re-designing an application and stumbled upon a problem serializing some data.

Say I have an array of size mxn

double **data;

that I want to serialize into a

char *dataSerialized

using simple delimiters (one for rows, one for elements).

De-serialization is fairly straightforward, counting delimiters and allocating size for the data to be stored. However, what about the serialize function, say

serialize_matri开发者_如何学编程x(double **data, int m, int n, char **dataSerialized);

What would be the best strategy to determine the size needed by the char array and allocate the appropriate memory for it?

Perhaps using some fixed width exponential representation of double's in a string? Is it possible to just convert all bytes of double into char's and have a sizeof(double) aligned char array? How would I keep the accuracy of the numbers intact?

NOTE:

I need the data in a char array, not in binary, not in a file.

The serialized data will be sent over the network using ZeroMQ between a C server and a Java client. Would it be possible, given the array dimensions and sizeof(double) that it can always be accurately reconstructed between those two?

Java has pretty good support for reading raw bytes and converting into whatever you want. You can decide on a simple wire-format, and then serialize to this in C, and unserialize in Java.

Here's an example of an extremely simple format, with code to unserialize and serialize.

I've written a slightly larger test program that I can dump somewhere if you want; it creates a random data array in C, serializes, writes the serialized string base64-encoded to stdout. The much smaller java-program then reads, decodes and deserializes this.

C code to serialize:

/* 
I'm using this format:
32 bit signed int                   32 bit signed int                   See below
[number of elements in outer array] [number of elements in inner array] [elements]

[elements] is buildt like
[element(0,0)][element(0,1)]...[element(0,y)][element(1,0)]...

each element is sendt like a 64 bit iee754 "double". If your C compiler/architecture is doing something different with its "double"'s, look forward to hours of fun :)

I'm using a couple non-standard functions for byte-swapping here, originally from a BSD, but present in glibc>=2.9.
*/

/* Calculate the bytes required to store a message of x*y doubles */
size_t calculate_size(size_t x, size_t y)
{
    /* The two dimensions in the array  - each in 32 bits - (2 * 4)*/
    size_t sz = 8;  
    /* a 64 bit IEE754 is by definition 8 bytes long :) */
    sz += ((x * y) * 8);    
    /* and a NUL */
    sz++;
    return sz;
}

/* Helpers */
static char* write_int32(int32_t, char*);
static char* write_double(double, char*);
/* Actual conversion. That wasn't so hard, was it? */
void convert_data(double** src, size_t x, size_t y, char* dst)
{

    dst = write_int32((int32_t) x, dst);    
    dst = write_int32((int32_t) y, dst);    

    for(int i = 0; i < x; i++) {
        for(int j = 0; j < y; j++) {
            dst = write_double(src[i][j], dst);
        }
    }
    *dst = '\0';
}


static char* write_int32(int32_t num,  char* c)
{
    char* byte; 
    int i = sizeof(int32_t); 
    /* Convert to network byte order */
    num = htobe32(num);
    byte = (char*) (&num);
    while(i--) {
        *c++ = *byte++;
    }
    return c;
}

static char* write_double(double d, char* c)
{
    /* Here I'm assuming your C programs use IEE754 'double' precision natively.
    If you don't, you should be able to convert into this format. A helper library most likely already exists for your platform.
    Note that IEE754 endianess isn't defined, but in practice, normal platforms use the same byte order as they do for integers.
*/
    char* byte; 
    int i = sizeof(uint64_t);
    uint64_t num = *((uint64_t*)&d);
    /* convert to network byte order */
    num = htobe64(num);
    byte = (char*) (&num);
    while(i--) {
        *c++ = *byte++; 
    }
    return c;
}

Java code to unserialize:

/* The raw char array from c is now read into the byte[] `bytes` in java */
DataInputStream stream = new DataInputStream(new ByteArrayInputStream(bytes));

int dim_x; int dim_y;
double[][] data;

try {   
    dim_x = stream.readInt();
    dim_y = stream.readInt();
    data = new double[dim_x][dim_y];
    for(int i = 0; i < dim_x; ++i) {
        for(int j = 0; j < dim_y; ++j) {
            data[i][j] = stream.readDouble();
        }
    }

    System.out.println("Client:");
    System.out.println("Dimensions: "+dim_x+" x "+dim_y);
    System.out.println("Data:");
    for(int i = 0; i < dim_x; ++i) {
        for(int j = 0; j < dim_y; ++j) {
            System.out.print(" "+data[i][j]);
        }
        System.out.println();
    }


} catch(IOException e) {
    System.err.println("Error reading input");
    System.err.println(e.getMessage());
    System.exit(1);
}

If you are writing a binary file, you should think of a good way to serialize the actual binary data (64bit) of your double. This could go from directly writing the content of the double to the file (minding endianness) to some more elaborate normalizing serialization schemes (e.g. with a well-defined representation of NaN). That's up to you really. If you expect to be basically among homogeneous architectures, a direct memory dump would probably suffice.

If you want to write to a text file and a are looking for an ASCII representation, I would strongly discourage a decimal numerical representation. Instead, you could convert the 64-bit raw data to ASCII using base64 or something like that.

You really want to keep all the precision that you have in your double!

继续阅读：c data-serialization

Simple data serialization in C

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？