开发者

Comparing uncompressed local files to compressed files stored on Amazon S3?

We put hundreds of image files on Amazon S3 that our users need to synchronize to their local directories. In order to save storage space and bandwidth, we zip the files stored on S3.

On the user's end they have a python script that runs every 5 min to get a current list of files, and download new/updated files.

My question is what's开发者_JS百科 the best way determine what is new or changed to download?

Currently we add an additional header that we put with the compressed file which contains the MD5 value of the uncompressed file...

We start with a file like this:

image_file_1.tif   17MB    MD5 = xxxx1234

We compress it (with 7zip) and put it to S3 (with Python/Boto):

image_file_1.tif.z  9MB    MD5 = yyy3456    x-amz-meta-uncompressedmd5 = xxxx1234

The problems is we can't get a large list of files from S3 that include the x-amz-meta-uncompressedmd5 header without an additional API for EACH one (SLOW for hundreds/thousands of files).

Our most practical solution is have users get a full list of files (without the extra headers), download the files that do not exist locally. If it does exist locally, then do and additional API call to get the full headers to compare local MD5 checksum against x-amz-meta-uncompressedmd5.

I'm thinking there must be a better way.


You could include the MD5 hash of the uncompressed image into the compressed filename.

So image_file_1.tif could become image_file_1.xxxx1234.tif.z

Your user python file which does the synchronising would therefore have the information needed to determine if it needed to go get the file again from S3, and could either strip out the MD5 part of the filename, or maintain it, depending on what you wanted to do.

Or, you could also maintain, on S3, a single file containing the full file list including the MD5 metadata. So the python script just need to fetch that single file, parse that, and then decide what to do.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜