Getting newest S3 keys first
I am writing an app that stores (potentially millions of) objects in an S3 bucket. My app will take the most recent object (roughly), process it, and write it back to the same bucket. I need a way of accessing keys and naming new objects so that the app can easily get to the newest objects.
I know I can do this properly by putting metadata in SimpleDB, but I don't need hard consistency. It's ok if the app grabs an object that isn't quite the newest. I just need the app to tend to grab new-ish keys instead of old ones. So I'm trying to keep it simple开发者_如何学C by using S3 alone.
Is there a way to access and sort on S3 meta-data? Or might there be a scheme for naming the objects that would get what I need (since I know S3 lists keys in lexicographic order and boto can handle paging).
s3 versioning really helps out here. If these are really the same "thing" you can turn on versioning for you bucket, get the data from your key, modify it and store it back to the same key.
you'll need to use boto's
bucket.get_all_versions( prefix='yourkeynamehere' )
you get versions out, most recent first, so while this function doesn't handle paging, you can just get the first index and you've got the most recent version.
if you want to go back further and need paging, boto also offers a list_versions() function that takes a prefix as well and will give you a result set that will iterate through all the versions without you needing to worry about it.
if these objects really aren't the "same" object, it really doesn't matter because s3 doesn't store diffs -- it stores the whole thing every time. If you have multiple 'types' of objects, you can have multiple version sets of which you can pull the most recent.
i've been using versioning and i'm pretty happy with it.
精彩评论