Getting at C binary data from OCaml
(Ignoring endianness for the sake of argument - this is just a test case/proof of concept - and I would never use strcpy
in real code either!)
Consider the following trivial C code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
/* variables of type message_t will be stored contiguously in memory */
typedef struct {
int message_id;
char message_text[80];
} message_t;
int main(int argc, char**argv) {
message_t* m = (message_t*)malloc(sizeof(message_t));
m->message_id = 1;
strcpy(m->message_text,"the rain in spain falls mainly on the plain");
/* write the memory to disk */
FILE* fp = fopen("data.dat", "wb");
fwrite((void*)m, sizeof(int) + strlen(m->message_text) + 1, 1, fp);
fclose(fp);
exit(EXIT_SUCCESS);
}
The file it writes can easily be read back in from disk:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef struct {
int message_id;
char message_text[80];
} message_t;
int main(int argc, char**argv) {
message_t* m = (message_t*)malloc(sizeof(message_t));
FILE* fp = fopen("data.dat", "rb");
fread((void*)m, sizeof(message_t), 1, fp);
fclose(fp);
/* block of memory has structure "overlaid" onto it */
printf("message_id=%d, message_text='%s'\n", m->message_id, m->message_text);
exit(EXIT_SUCCESS);
}
E.g.
$ ./write
$ ./read
message_id=1, message_text='the rain in spain falls mainly on the plain'
My question is, in OCaml, if all I have is:
type message_t = {message_id:int; message_text:string}
How would I get at that data? Marshal
can't do it, nor can input_binary_int
. I can call out to helper functions in C like "what is sizeof(int)
" then get n bytes and call a C function to "convert these bytes into an int" for example but in this case I can't add any new C code, the "unpacking" has to be done in OCaml, based on what I know it "should" be. Is it just a matter of iterating over t开发者_StackOverflow中文版he string either in blocks of sizeof
s or looking for '\0' or is there a clever way? Thanks!
For doing this kind of low level struct handling, I find OCaml Bitstring very convenient. The equivalent reader for your message_t would be this if you wrote all 80 characters to disk:
bitmatch (Bitstring.bitstring_from_file "data.dat") with
| { message_id : 32;
message_text : 8 * 80 : string;
} ->
Printf.printf "message_id=%ld, message_text='%s'\n"
message_id message_text
| { _ } -> failwith "Not a valid message_t"
As is, you'll have to trim message_text
, but maybe bitstring is what you want to do this kind of task in general.
Before you can figure out how to code this in Ocaml, you need to figure out what your data representation is. Your C code isn't consistent between the reader and the writer: the writer only writes strlen(m->message_text)+1
bytes for the string, whereas the reader expects the full maximum 80 bytes.
My advice is to do all your marshalling in the same language, either C or Ocaml. I recommend Ocaml's marshalling library, which is already working, cross-platform and easy to use.
If you need interoperability between C and Ocaml marshalling code, then you need to sit down a marshalling format, and implement that same specification in both languages. Before you do that, consider if you can use a text representation, which will be less error-prone and easier to inspect and manipulate with third-party tools, but bulkier. JSON is a lightweight data representation format, or you can turn to the heavyweight XML. If all your data is truly as simple as an integer and a string, and the strings don't contain newlines, you can write the integer in decimal followed by a space (or a :
or a ,
) followed by the string followed by a newline.
If the C marshalling format is predefined and you can't change it, note that it's platform-dependent (depends on the architecture and the C compiler), and Ocaml doesn't give you access to such platform details. So your best bet is to link your Ocaml program with a C helper, making sure that your helper uses the same C type representation (sizeof(int)
, endianness, structure padding) as the original application.
You are relying on using the same C compiler on the same platform to avoid having to think about what the format of the written and read back data is. Unfortunately you don't have that luxury if you are trying to interoperate between C and OCaml. You have to count the bytes in the structure, figure out if the integer is little- or big-endian, and code accordingly on the OCaml side.
You'll have to manually unmarshall each type separately, in effect parsing the binary file. For instance, to read a little-endian 32-bit integer you'd have to use:
let input_le_int32 inch =
let res = ref 0l in
for i = 0 to 3 do
let byte = input_byte inch in
res := Int32.logor !res (Int32.shift_left (Int32.of_int byte) (8*i))
done;
!res
and to read a NUL-terminated string:
let input_c_string inch =
let res = Buffer.create 256 in
try while true do
let byte = input_byte inch in
if byte = 0 then raise Exit else
Buffer.add_char res (char_of_int byte)
done; assert false with Exit ->
Buffer.contents res
If everything is right you can read back your structure with:
let input_message inch =
let message_id = input_le_int32 inch in
let message_text = input_c_string inch in
{ message_id; message_text; }
Note: it is imperative (!) to sequence the reads to avoid reading fields out-of-order. Do not use parallel let
assignments.
Thanks for advice all; I have written up the approach I decided to take in my blog.
精彩评论