开发者

How can I tell what Database format a file (or set of files) was created with (in Delphi)?

I have a number of data files created by many different programs. Is there a way to determine the database and version of the database that was used to create the data file.

For example, I'd like to identify w开发者_StackOverflow中文版hich files are created from Microsoft Access, dBASE, FileMaker, FoxPro, SQLite or others.

I really just want to somehow quickly scan the files, and display information about them, including source Database and Version.

For reference, I'm using Delphi 2009.


First of all, check the file extension. Take a look at the corresponding wikipedia article, or other sites.

Then you can guess the file format from its so called "signature".

This is mostly the first characters content, which is able to identify the file format.

You've an updated list at this very nice Gary Kessler's website.

For instance, here is how our framework identify the MIME format from the file content, on the server side:

function GetMimeContentType(Content: Pointer; Len: integer;
  const FileName: TFileName=''): RawUTF8;
begin // see http://www.garykessler.net/library/file_sigs.html for magic numbers
  result := '';
  if (Content<>nil) and (Len>4) then
    case PCardinal(Content)^ of
    $04034B50: Result := 'application/zip'; // 50 4B 03 04
    $46445025: Result := 'application/pdf'; //  25 50 44 46 2D 31 2E
    $21726152: Result := 'application/x-rar-compressed'; // 52 61 72 21 1A 07 00
    $AFBC7A37: Result := 'application/x-7z-compressed';  // 37 7A BC AF 27 1C
    $75B22630: Result := 'audio/x-ms-wma'; // 30 26 B2 75 8E 66
    $9AC6CDD7: Result := 'video/x-ms-wmv'; // D7 CD C6 9A 00 00
    $474E5089: Result := 'image/png'; // 89 50 4E 47 0D 0A 1A 0A
    $38464947: Result := 'image/gif'; // 47 49 46 38
    $002A4949, $2A004D4D, $2B004D4D:
      Result := 'image/tiff'; // 49 49 2A 00 or 4D 4D 00 2A or 4D 4D 00 2B
    $E011CFD0: // Microsoft Office applications D0 CF 11 E0 = DOCFILE
      if Len>600 then
      case PWordArray(Content)^[256] of // at offset 512
        $A5EC: Result := 'application/msword'; // EC A5 C1 00
        $FFFD: // FD FF FF
          case PByteArray(Content)^[516] of
            $0E,$1C,$43: Result := 'application/vnd.ms-powerpoint';
            $10,$1F,$20,$22,$23,$28,$29: Result := 'application/vnd.ms-excel';
          end;
      end;
    else
      case PCardinal(Content)^ and $00ffffff of
        $685A42: Result := 'application/bzip2'; // 42 5A 68
        $088B1F: Result := 'application/gzip'; // 1F 8B 08
        $492049: Result := 'image/tiff'; // 49 20 49
        $FFD8FF: Result := 'image/jpeg'; // FF D8 FF DB/E0/E1/E2/E3/E8
        else
          case PWord(Content)^ of
            $4D42: Result := 'image/bmp'; // 42 4D
          end;
      end;
    end;
  if (Result='') and (FileName<>'') then begin
    case GetFileNameExtIndex(FileName,'png,gif,tiff,tif,jpg,jpeg,bmp,doc,docx') of
      0:   Result := 'image/png';
      1:   Result := 'image/gif';
      2,3: Result := 'image/tiff';
      4,5: Result := 'image/jpeg';
      6:   Result := 'image/bmp';
      7,8: Result := 'application/msword';
      else begin
        Result := RawUTF8(ExtractFileExt(FileName));
        if Result<>'' then begin
          Result[1] := '/';
          Result := 'application'+LowerCase(Result);
        end;
      end;
    end;
  end;
  if Result='' then
    Result := 'application/octet-stream';
end;

You can use a similar function, from the GAry Kessler's list.


There are lots of database engines with hundreds (if not thousands) of versions and formats. (Binary, CSV, XML...) Many of them are encrypted to protect the content. It is quite "impossible" to identify every database and every format and it is a subject of constant changes.

So first of all you have to limit your task to a list of database engines you want to scan. Thats what i would do...


First, I do not believe you could do more in a "quick scan" than provide a "possible format". Also, it's very difficult to imagine that any quick technique could be reliable.

DBASE files commonly use the extension .dbf. There are variants of the dBase file format used by FoxPro, and Clipper. Wikipedia documents these as xBase. Any dBase library that can open dBase files will also probably be able to (a) show that this is in fact a true dBase file by opening it, and (b) allow you to see which supported variants of the xBase file format are in use.

Access files are usually using the .mdb file format, but can be encrypted with a password. You could probably write your own library that could postiively identify the internal content as being of the "Jet database engine" (internal type of file used by Access) but not read the content, but I doubt that short of cracking the password, you could do this reliably.

FileMaker files can have many file extensions, and their internal file formats are not well documented. According to wikipedia, .fm .fp3 .fp5 and .fp7 are common file extensions. You will have similar "password" problems with filemaker databases, as with Access. I am not aware of any way to read filemaker files in delphi except through ODBC, and even then, I don't think you could provide an "omni-reader" in Delphi that was powered by ODBC, since ODBC requires careful setup and knowledge of the originating file into an odbc data source before it becomes readable through ODBC. Browse/Discovery is not a phase that is supported by ODBC.

SQLite files can have any file extension at all. The easiest way to try to detect it would be to load/open the file using SQLite and see if it opens.

The rest of the list is more or less infinite, and the technique would be the same. Just keep rolling more database engines and access layer libraries into your Katamari Damaci Database Detector Tool.

If you want to start with old database formats as you seem to be, I would investigate using BDE (ancient, but hey, you're talking about ancient stuff), plus ADO, to try to auto-detect and open files.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜