开发者

How to list objects by extension from s3 api?

Can i so开发者_JAVA百科mehow search objects in S3 by extension, not only by prefix?

Here is what i have now:

ListObjectsResponse r = s3Client.ListObjects(new Amazon.S3.Model.ListObjectsRequest()
{
    BucketName = BucketName,
    Marker = marker,
    Prefix = folder, 
    MaxKeys = 1000
});

So, I need to list all *.xls files in my bucket.


While I do think the BEST answer is to use a database to keep track of your files for you, I also think its an incredible pain in the ass. I was working within python with boto3, and this is the solution I came up with.

It's not elegant, but it will work. List all the files, and then filter it down to a list of the ones with the "suffix"/"extension" that you want in code.

s3_client = boto3.client('s3')
bucket = 'my-bucket'
prefix = 'my-prefix/foo/bar'
paginator = s3_client.get_paginator('list_objects_v2')
response_iterator = paginator.paginate(Bucket=bucket, Prefix=prefix)

file_names = []

for response in response_iterator:
    for object_data in response['Contents']:
        key = object_data['Key']
        if key.endswith('.json'):
            file_names.append(key)

print file_names


I don't believe this is possible with S3.

The best solution is to 'index' S3 using a database (Sql Server, MySql, SimpleDB etc) and do your queries against that.


You don't actually need a separate database to do this for you.

S3 gives you the ability to list objects in a bucket with a certain prefix. Your dilemma is that the ".xls" extension is at the end of the file name, therefore, prefix search doesn't help you. However, when you put the file into the bucket, you can change the object name so that the prefix contains the file type (for example: XLS-myfile.xls). Then, you can use the S3 API listObjects and pass a prefix of "XLS".


I'm iterating after fetching the file information. End result will be in dict

import boto3

s3 = boto3.resource('s3')

bucket = s3.Bucket('bucket_name')

#get all files information from buket
files = bucket.objects.all()

# create empty list for final information
files_information = []

# your known extensions list. we will compare file names with this list
extensions = ['png', 'jpg', 'txt', 'docx']

# Iterate throgh 'files', convert to dict. and add extension key.
for file in files:
    if file.key[-3:] in extensions:
        files_information.append({'file_name' : file.key, 'extension' : file.key[-3:]})
    else:
        files_information.append({'file_name' : file.key, 'extension' : 'unknown'})


print files_information


Because by using boto3 resource to get objects from S3, you can get satisfied result by using the returned file extension to filter what you want. Like this:

import boto3
s3 = boto3.resource('s3')
my_bucket = s3.Bucket('my_bucket')
files = my_bucket.objects.all()
file_list = []
for file in files:
    if file.key.endswith('.docx'):
         file_list.append(file.key)

You can change the endswith string with what you want.


if you're simply searching you can probably find them by using a combination of awscli and grep as follows:

aws s3 ls s3://<your-bucket-name> --recursive | grep <your-file-extension>


You can easily list all the elements by extension, getting all the elements (including folders) and then filtering by key.endswith('...')

import boto3

s3 = boto3.resource('s3')
bucket = s3.Bucket('your-route')

# Data from S3 is also filtered by endswith from key property
for _ in bucket.objects.filter(Prefix=test_dir):
   if _.key.endswith('.zicu'):
      print('Value of object: ', _.key)

In this case I'm filtering each element with a Prefix (test_dir) and then showing just the elements with .zicu extension

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜