Best way to read/parse a untyped binary file in Delphi
I would like to know what is the best way to parse an untyped binary file. For example, a EBML file. (http://ebml.sourceforge.net/). EBML is basically a binary xml file. It can开发者_高级运维 store basically anything, but its predominate use right now are MKV video files (matroska).
To read a EBML file at the byte level, reading the header making sure it is a EBML file and retrieving information on the file. MKV files can be huge, 1-30gb in size.
The binary file could be anything, jpeg, bmp, avi etc ... I just want to learn how to read them.
Basically, you do
const
MAGIC_WORD = $535B;
type
TMyFileTypeHeader = packed record
MagicWord: word; // = MAGIC_WORD
Size: cardinal;
Version: cardinal;
Width: cardinal;
Height: cardinal;
ColorDepth: cardinal;
Title: array[0..31] of char;
end;
procedure ReadFile(const FileName: string);
var
f: file;
amt: integer;
FileHeader: TMyFileTypeHeader;
begin
FileMode := fmOpenRead;
AssignFile(f, FileName);
try
Reset(f, 1);
BlockRead(f, FileHeader, sizeof(TMyFileTypeHeader), amt);
if FileHeader.MagicWord <> MAGIC_WORD then
raise Exception.Create(Format('File "%s" is not a valid XXX file.', [FileName]));
// Read, parse, and do something
finally
CloseFile(f);
end;
end;
For instance, a bitmap file begins with a BITMAPFILEHEADER
structure, followed (in version 3) by a BITMAPINFOHEADER
. Followed by an optional array of palette items, followed by uncompressed RGB pixel data (in the simplest case, here in 24-bit format): BBGGRRBBGGRRBBGGRR...
Reading a JPG, on the other hand, is very complicated, because the JPG data is compressed in a way that requires a lot of advanced mathematics to even understand (I think -- I have actually never really dug into the JPG specs). At least, this is true for a lot of modern image file formats. BMP, on the other hand, is trivial -- the "worst" thing that can happen is that the image is RLE compressed.
The "details" of parsing a file depends entirely on the file format. The file format specification tells the developer how the data is stored in binary form (above, the two bitmap structures are part of the Windows bitmap specification). It is like a contract, signed (not literally) by all encoders/decoders of such files. In the case of EBML, the specification appears to be available here.
Just use a TFileStream, like so ...
var MyFile: TStream;
begin
MyFile := TFileStream.Create( fmOpenRead, FileName);
try
// Read stuff
MyFile.ReadBuffer( MyVariable, SizeOf( MyVariable));
// etc.
finally
MyFile.Free
end;
You could memory map the file. Then you can access it as if you were accessing memory. See http://msdn.microsoft.com/en-us/library/aa366556(VS.85).aspx
精彩评论