开发者

delphi compare text file contents

We need to compare the contents of two (or more) text files to determine if we need to create a backup. If they differ we create a new backup.

I currently use the CRC value of each file to check for differences but I was wondering if there is a more efficient or elegant way of detecting differences between to files.

//Use madZIP to calculate the CRC fior this file
GetUncompressedFileInfo(Filename_1, Size_1, NewCRC);

//Use m开发者_运维问答adZIP to calculate the CRC fior this file
GetUncompressedFileInfo(Filename_2, Size_2, OldCRC);

//if ThisFileHash = ExistingFileHash then
if (OldCRC <> NewCRC) then
  CreateABackup;

Regards, Pieter.


CRC is not a safe method to detect file changes - cryptographic hashes (like MD5 or SHA1) are much better.

Another approach (like the one used by build systems) is to compare file dates. If the file is newer than backup, a new backup is needed.


CRC is probably more accurate, and pretty efficient. However do you need to check the contents?

I'm assuming you're checking the CRC to see if a modification has been made and re-backup the updated file. In which case FileAge() would do just fine.


You should also consider using an incremental backup.

I've published some optimized file versioning functions for our SynProject Open Source tool. The TVersions class, in ProjectVersioning unit allows binary diff storage inside a zip container.

Our proprietary but faster-than-zip SynLZ algorithm is used to store incremental differences. It works very well on practice.

See e.g. TVersions.FillStrings method for retrieving a list of files to be updated.

Be aware that you may discover a one-hour difference, depending on the current Daylight saving time. Here is how we allow a per-date comparison:

function SameFileDateWindows(FileDate1,FileDate2: integer): boolean;
// we allow an exact one Hour round (NTFS bug on summer time zone change)
begin
  dec(FileDate1,FileDate2);
  result := (FileDate1=0) or (FileDate1=1 shl 11) or (FileDate1=-(1 shl 11));
end;

We don't read the file content here. For a backup purpose, it's enough to rely on the file date to mark the file as to be compared. Then a differential diff is performed about both versions of the file. If the file content is the same, it will store only the date difference.

IMHO you should not use the proprietary madzip container, but a standard one, like the .zip. There are several around, include our version used in SynProject or our ORM. It's faster than MadZip and decompression is in optimized asm. See SynZip unit for low-level compression and a simple .zip reader and writer, and more evolved classes in SynZipFiles (used in SynProject). For a pure Delphi version, like madzip one, check the PasZip unit which is faster than madzip (but PasZip won't compile with Unicode Delphi, whereas SynZip does).


Actually, best practice to assure file identity is to store content hashes (eg: CRC-32 or any other hash function) and the file sizes. Doing so will increase reliability by magnitude. RE: to store - there is no need to compute hash on contents known to be unchanged more than once.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜