开发者

Best way to read/parse a untyped binary file in Delphi

I would like to know what is the best way to parse an untyped binary file. For example, a EBML file. (http://ebml.sourceforge.net/). EBML is basically a binary xml file. It can开发者_高级运维 store basically anything, but its predominate use right now are MKV video files (matroska).

To read a EBML file at the byte level, reading the header making sure it is a EBML file and retrieving information on the file. MKV files can be huge, 1-30gb in size.

The binary file could be anything, jpeg, bmp, avi etc ... I just want to learn how to read them.


Basically, you do

const
  MAGIC_WORD = $535B;

type
  TMyFileTypeHeader = packed record
    MagicWord: word; // = MAGIC_WORD
    Size: cardinal;
    Version: cardinal;
    Width: cardinal;
    Height: cardinal;
    ColorDepth: cardinal;
    Title: array[0..31] of char;
  end;

procedure ReadFile(const FileName: string);
var
  f: file;
  amt: integer;
  FileHeader: TMyFileTypeHeader;
begin

  FileMode := fmOpenRead;
  AssignFile(f, FileName);

  try
    Reset(f, 1);

    BlockRead(f, FileHeader, sizeof(TMyFileTypeHeader), amt);

    if FileHeader.MagicWord <> MAGIC_WORD then
      raise Exception.Create(Format('File "%s" is not a valid XXX file.', [FileName]));

    // Read, parse, and do something

  finally
    CloseFile(f);
  end;     


end;

For instance, a bitmap file begins with a BITMAPFILEHEADER structure, followed (in version 3) by a BITMAPINFOHEADER. Followed by an optional array of palette items, followed by uncompressed RGB pixel data (in the simplest case, here in 24-bit format): BBGGRRBBGGRRBBGGRR...

Reading a JPG, on the other hand, is very complicated, because the JPG data is compressed in a way that requires a lot of advanced mathematics to even understand (I think -- I have actually never really dug into the JPG specs). At least, this is true for a lot of modern image file formats. BMP, on the other hand, is trivial -- the "worst" thing that can happen is that the image is RLE compressed.

The "details" of parsing a file depends entirely on the file format. The file format specification tells the developer how the data is stored in binary form (above, the two bitmap structures are part of the Windows bitmap specification). It is like a contract, signed (not literally) by all encoders/decoders of such files. In the case of EBML, the specification appears to be available here.


Just use a TFileStream, like so ...

var MyFile: TStream;
begin
MyFile := TFileStream.Create( fmOpenRead, FileName);
try
  // Read stuff
  MyFile.ReadBuffer( MyVariable, SizeOf( MyVariable));
  // etc.
finally
  MyFile.Free
  end;


You could memory map the file. Then you can access it as if you were accessing memory. See http://msdn.microsoft.com/en-us/library/aa366556(VS.85).aspx

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜