开发者

Direct Upload to S3 Using Python/Boto/Django to Construct Policy

I have been through many iterations of this problem so far, searched out many different examples, and have been all through the documentation.

I am trying to combine Plupload (http://www.plupload.com/) with the AWS S3 direct post method (http://aws.amazon.com/articles/1434). However, I believe there's something wrong with the way I am constructing my policy and s开发者_如何转开发ignature for transmission. When I submit the form, I don't get a response from the server, but rather my connection to the server is reset.

I have attempted using the python code in the example:

import base64
import hmac, sha

policy = base64.b64encode(policy_document)

signature = base64.b64encode(
hmac.new(aws_secret_key, policy, sha).digest())

I have also tried to use the more up-to-date hashlib library in python. Whatever method I use to construct my policy and signature, I always get different values than those generated here:

http://s3.amazonaws.com/doc/s3-example-code/post/post_sample.html

I have read through this question:

How do I make Plupload upload directly to Amazon S3?

But I found the examples provided to be overly complicated and wasn't able to accurately implement them.

My most recent attempts have been to use portions of the boto library:

http://boto.cloudhackers.com/ref/s3.html#module-boto.s3.connection

But using the S3Commection.build_post_form_args method has not worked for me either.

If anyone could provide a proper example of how to create the post form using python, I would very much appreciate it. Even some simple insights on why the connection is always reset would be nice.

Some caveats:

I would like to use hashlib if possible. I want to get an XML response from Amazon (presumably "success_action_status = '201'" does this) I need to be able to upload largish type files, max size ~2GB.

One final note, when I run this in Chrome, it provides upload progress, and the upload usually fails around 37%.


Nathan's answer helped get me started. I've included two solutions that are currently working for me.

The first solution uses plain Python. The second uses boto.

I tried to get boto working first, but kept getting errors. So I went back to the Amazon ruby documentation and got S3 to accept files using python without boto. (Browser Uploads to S3 using HTML POST)

After understanding what was going on, I was able to fix my errors and use boto, which is a simpler solution.

I'm including solution 1 because it shows explicitly how to setup the policy document and signature using python.

My goal was to create the html upload page as a dynamic page, along with the "success" page the user sees after a successful upload. Solution 1 shows the dynamic creation of the form upload page, while solution 2 shows the creation of both the upload form page and the success page.

Solution 1:

import base64
import hmac, hashlib

###### EDIT ONLY THE FOLLOWING ITEMS ######

DEBUG = 1
AWS_SECRET_KEY = "MySecretKey"
AWS_ACCESS_KEY = "MyAccessKey"
HTML_NAME = "S3PostForm.html"
EXPIRE_DATE = "2015-01-01T00:00:00Z" # Jan 1, 2015 gmt
FILE_TO_UPLOAD = "${filename}"
BUCKET = "media.mysite.com"
KEY = ""
ACL = "public-read" # or "private"
SUCCESS = "http://media.mysite.com/success.html"
CONTENT_TYPE = ""
CONTENT_LENGTH = 1024**3 # One gigabyte
HTTP_OR_HTTPS = "http" # Or "https" for better security
PAGE_TITLE = "My Html Upload to S3 Form"
ACTION = "%s://%s.s3.amazonaws.com/" % (HTTP_OR_HTTPS, BUCKET)

###### DON'T EDIT FROM HERE ON DOWN ######

policy_document_data = {
"expire": EXPIRE_DATE,
"bucket_name": BUCKET,
"key_name": KEY,
"acl_name": ACL,
"success_redirect": SUCCESS,
"content_name": CONTENT_TYPE,
"content_length": CONTENT_LENGTH,
}

policy_document = """
{"expiration": "%(expire)s",
  "conditions": [ 
    {"bucket": "%(bucket_name)s"}, 
    ["starts-with", "$key", "%(key_name)s"],
    {"acl": "%(acl_name)s"},
    {"success_action_redirect": "%(success_redirect)s"},
    ["starts-with", "$Content-Type", "%(content_name)s"],
    ["content-length-range", 0, %(content_length)d]
  ]
}
""" % policy_document_data

policy = base64.b64encode(policy_document)
signature = base64.b64encode(hmac.new(AWS_SECRET_KEY, policy, hashlib.sha1).digest())

html_page_data = {
"page_title": PAGE_TITLE,
"action_name": ACTION,
"filename": FILE_TO_UPLOAD,
"access_name": AWS_ACCESS_KEY,
"acl_name": ACL,
"redirect_name": SUCCESS,
"policy_name": policy,
"sig_name": signature,
"content_name": CONTENT_TYPE,
}

html_page = """
<html> 
 <head>
  <title>%(page_title)s</title> 
  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
 </head>
<body>
 <form action="%(action_name)s" method="post" enctype="multipart/form-data">
  <input type="hidden" name="key" value="%(filename)s">
  <input type="hidden" name="AWSAccessKeyId" value="%(access_name)s">
  <input type="hidden" name="acl" value="%(acl_name)s">
  <input type="hidden" name="success_action_redirect" value="%(redirect_name)s">
  <input type="hidden" name="policy" value="%(policy_name)s">
  <input type="hidden" name="signature" value="%(sig_name)s">
  <input type="hidden" name="Content-Type" value="%(content_name)s">

  <!-- Include any additional input fields here -->

  Browse to locate the file to upload:<br \> <br \>

  <input name="file" type="file"><br> <br \>
  <input type="submit" value="Upload File to S3"> 
 </form> 
</body>
</html>
""" % html_page_data

with open(HTML_NAME, "wb") as f:
    f.write(html_page)

###### Dump output if testing ######
if DEBUG:

    if 1: # Set true if not using the LEO editor
        class G:
            def es(self, data):print(data)
        g = G()

    items = [
    "",
    "",
    "policy_document: %s" % policy_document,
    "ploicy: %s" % policy,
    "signature: %s" % signature,
    "",
    "",
    ]
    for item in items:
        g.es(item)

Solution 2:

from boto.s3 import connection

###### EDIT ONLY THE FOLLOWING ITEMS ######

DEBUG = 1
AWS_SECRET_KEY = "MySecretKey"
AWS_ACCESS_KEY = "MyAccessKey"
HTML_NAME = "S3PostForm.html"
SUCCESS_NAME = "success.html"
EXPIRES = 60*60*24*356 # seconds = 1 year
BUCKET = "media.mysite.com"
KEY = "${filename}" # will match file entered by user
ACL = "public-read" # or "private"
SUCCESS = "http://media.mysite.com/success.html"
CONTENT_TYPE = "" # seems to work this way
CONTENT_LENGTH = 1024**3 # One gigabyte
HTTP_OR_HTTPS = "http" # Or https for better security
PAGE_TITLE = "My Html Upload to S3 Form"

###### DON'T EDIT FROM HERE ON DOWN ######

conn = connection.S3Connection(AWS_ACCESS_KEY,AWS_SECRET_KEY)
args = conn.build_post_form_args(
    BUCKET,
    KEY,
    expires_in=EXPIRES,
    acl=ACL,
    success_action_redirect=SUCCESS,
    max_content_length=CONTENT_LENGTH,
    http_method=HTTP_OR_HTTPS,
    fields=None,
    conditions=None,
    storage_class='STANDARD',
    server_side_encryption=None,
    )

form_fields = ""
line = '  <input type="hidden" name="%s" value="%s" >\n'
for item in args['fields']:
    new_line = line % (item["name"], item["value"])
    form_fields += new_line

html_page_data = {
"page_title": PAGE_TITLE,
"action": args["action"],
"input_fields": form_fields,
}

html_page = """
<html> 
 <head>
  <title>%(page_title)s</title> 
  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
 </head>
<body>
 <form action="%(action)s" method="post" enctype="multipart/form-data" >
%(input_fields)s
  <!-- Include any additional input fields here -->

  Browse to locate the file to upload:<br \> <br \>

  <input name="file" type="file"><br> <br \>
  <input type="submit" value="Upload File to S3"> 
 </form> 
</body>
</html>
""" % html_page_data

with open(HTML_NAME, "wb") as f:
    f.write(html_page)

success_page = """
<html>
  <head>
    <title>S3 POST Success Page</title>
      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
      <script src="jquery.js"></script>
      <script src="purl.js"></script>
<!--

    Amazon S3 passes three data items in the url of this page if
        the upload was successful:
        bucket = bucket name
        key = file name upload to the bucket
        etag = hash of file

    The following script parses these values and puts them in
    the page to be displayed.

-->

<script type="text/javascript">
var pname,url,val,params=["bucket","key","etag"];
$(document).ready(function()
{
  url = $.url();
  for (param in params)
  {
    pname = params[param];
    val = url.param(pname);
    if(typeof val != 'undefined')
      document.getElementById(pname).value = val;
  }
});
</script>

  </head>
  <body>
      <div style="margin:0 auto;text-align:center;">
      <p>Congratulations!</p>
      <p>You have successfully uploaded the file.</p>
        <form action="#" method="get"
          >Location:
        <br />
          <input type="text" name="bucket" id="bucket" />
        <br />File Name:
        <br />
          <input type="text" name="key" id="key" />
        <br />Hash:
        <br />
          <input type="text" name="etag" id="etag" />
      </form>
    </div>
  </body>
</html>
"""

with open(SUCCESS_NAME, "wb") as f:
    f.write(success_page)

###### Dump output if testing ######
if DEBUG:

    if 1: # Set true if not using the LEO editor
        class G:
            def es(self, data):print(data)
        g = G()

    g.es("conn = %s" % conn)
    for key in args.keys():
        if key is not "fields":
            g.es("%s: %s" % (key, args[key]))
            continue
        for item in args['fields']:
            g.es(item)


I tried using Boto but found it didn't let me put in all of the headers I wanted. Below you can see what I do to generate the policy, signature, and a dictionary of post form values.

Note that all of the x-amz-meta-* tags are custom header properties and you don't need them. Also notice that pretty much everything that is going to be in the form needs to be in the policy that gets encoded and signed.

def generate_post_form(bucket_name, key, post_key, file_id, file_name, content_type):
  import hmac
  from hashlib import sha1
  from django.conf import settings
  policy = """{"expiration": "%(expires)s","conditions": [{"bucket":"%(bucket)s"},["eq","$key","%(key)s"],{"acl":"private"},{"x-amz-meta-content_type":"%(content_type)s"},{"x-amz-meta-file_name":"%(file_name)s"},{"x-amz-meta-post_key":"%(post_key)s"},{"x-amz-meta-file_id":"%(file_id)s"},{"success_action_status":"200"}]}"""
  policy = policy%{
    "expires":(datetime.utcnow()+settings.TIMEOUT).strftime("%Y-%m-%dT%H:%M:%SZ"), # This has to be formatted this way
    "bucket": bucket_name, # the name of your bucket
    "key": key, # this is the S3 key where the posted file will be stored
    "post_key": post_key, # custom properties begin here
    "file_id":file_id,
    "file_name": file_name,
    "content_type": content_type,
  }
  encoded = policy.encode('utf-8').encode('base64').replace("\n","") # Here we base64 encode a UTF-8 version of our policy.  Make sure there are no new lines, Amazon doesn't like them.
  return ("%s://%s.s3.amazonaws.com/"%(settings.HTTP_CONNECTION_TYPE, self.bucket_name),
          {"policy":encoded,
           "signature":hmac.new(settings.AWS_SECRET_KEY,encoded,sha1).digest().encode("base64").replace("\n",""), # Generate the policy signature using our Amazon Secret Key
           "key": key,
           "AWSAccessKeyId": settings.AWS_ACCESS_KEY, # Obviously the Amazon Access Key
           "acl":"private",
           "x-amz-meta-post_key":post_key,
           "x-amz-meta-file_id":file_id,
           "x-amz-meta-file_name": file_name,
           "x-amz-meta-content_type": content_type,
           "success_action_status":"200",
          })

The returned tuple can then be used to generate a form that posts to the generated S3 url with all of the key value pairs from the dictionary as hidden fields and your actual file input field, whose name/id should be "file".

Hope that helps as an example.


I've been struggling with this same exact problem, using almost the same exact code, for days. ( See Python Generated Signature for S3 Post ) Just tried to encode my policy in accordance with White Box Dev's code, but still not coming up with the same as AWS suggests I should have. I eventually gave up and used...

http://s3.amazonaws.com/doc/s3-example-code/post/post_sample.html

...and just inserted the values it returns into the HTML form. Works great.

@Mr. Oodles: if you're storing your aws_secret_key in a separate file, use bash command ls -al to check it's number of bytes prior to generating the signature. It should be 40 bytes long. As White Box Dev points out, AWS does not like \n, and it's possible you're bundling up this hidden character (or a carriage return or ^M) with your aws_secret_key string when saving it...hence making it 41 bytes long. You can try .replace("\n", "") or .rstrip() to get rid of it after reading it into your script, The .encode("utf-8") may work for you, too. None of those worked for me, however. Curious if you're running Python on Windows or Unix... You could also try using emacs to save the string without the \n being automatically inserted by your editor.


Try checking out https://github.com/burgalon/plupload-s3mixin It combines PLUPLOAD with direct S3 upload

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜