Using base64 encoding as a mechanism to detect changes
Is it possible to detect changes in the base64 encoding of an object to detect the degree of changes in the object.
Suppose I send a document attachment to several us开发者_开发知识库ers and each makes changes to it and emails back to me, can I use the string distance between original base64 and the received base64s to detect which version has the most changes. Would that be a valid metric?
If not, would there be any other metrics to quantify the deltas?
That would depend entirely on the type of the document you had encoded. If it was a text file, then sure, the base64 encoded difference are probably on a par with the actual changes. However, you may have a format of a file where changes to the contents effectively produce a completely different binary file. An example of this would be a ZIP file.
you should do the same that diff does. Then for example do the metrics on diff fiel size.
In theory, yes, if do a smart diff (detecting inserts, deletions, and modifications).
In practice, no, unless the documents are absolutely plain text. Binary formats can't be meaningfully diff'd.
Base64 packs groups of 3x8 bit values into 4x6. If you change one 8 bit value by one bit, then you'll impact only one of the 6 bit values. If you change by two bits, then you have about a 5/12 chance of hitting one of the other 6 bit values. So if you're counting bits, it is entirely equivalent; otherwise, you will introduce noise depending on the metric you use.
精彩评论