Decode base64 data as array in Python
I'm using this handy Javascript function to decode a base64 string and get an array in return.
This is the string:
base64_decode_array('6gAAAOsAAADsAAAACAEAAAkBAAAKAQAAJgEAACcBAAAoAQAA')
This is what's returned:
234,0,0,0,235,0,0,0,236,0,0,0,8,1,0,0,9,1,0,0,10,1,0,0,38,1,0,0,39,1,0,0,40,1,0,0
The problem is I don't really understand the javascript function:
var base64chars = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/'.split("");
var base64inv = {};
for (var i = 0; i < base64chars.length; i++) {
base64inv[base64chars[i]] = i;
}
function base64_decode_array (s)
{
// remove/ignore any characters not in the base64 characters list
// or the pad character -- particularly newlines
s = s.replace(new RegExp('[^'+base64chars.join("")+'=]', 'g'), "");
// replace any incoming padding with a zero pad (the 'A' character is zero)
var p = (s.charAt(s.length-1) == '=' ?
(s.charAt(s.leng开发者_StackOverflow中文版th-2) == '=' ? 'AA' : 'A') : "");
var r = [];
s = s.substr(0, s.length - p.length) + p;
// increment over the length of this encrypted string, four characters at a time
for (var c = 0; c < s.length; c += 4) {
// each of these four characters represents a 6-bit index in the base64 characters list
// which, when concatenated, will give the 24-bit number for the original 3 characters
var n = (base64inv[s.charAt(c)] << 18) + (base64inv[s.charAt(c+1)] << 12) +
(base64inv[s.charAt(c+2)] << 6) + base64inv[s.charAt(c+3)];
// split the 24-bit number into the original three 8-bit (ASCII) characters
r.push((n >>> 16) & 255);
r.push((n >>> 8) & 255);
r.push(n & 255);
}
// remove any zero pad that was added to make this a multiple of 24 bits
return r;
}
What's the function of those "<<<" and ">>>" characters. Or is there a function like this for Python?
Who cares. Python has easier ways of doing the same.
[ord(c) for c in '6gAAAOsAAADsAAAACAEAAAkBAAAKAQAAJgEAACcBAAAoAQAA'.decode('base64')]
In Python I expected you'd just use the base64 module...
... but in response to your question about <<
and >>>
:
<<
is the left-shift operator; the result is the first operand shifted left by the second operand number of bits; for example5 << 2
is20
, as 5 is 101 in binary, and 20 is 10100.>>>
is the non-sign-extended right-shift operator; the result is the first operand shifted right by the second operand number of bits... with the leftmost bit always being filled with a 0.
Why not just:
from binascii import a2b_base64, b2a_base64
encoded_data = b2a_base64(some_string)
decoded_string = a2b_base64(encoded_data)
def base64_decode_array(string):
return [ord(c) for c in a2b_base64(string)]
Just for fun/completeness, I'll translate the javascript more literally: :)
# No particular reason to make a list of chars here instead of a string.
base64chars = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/'
lookup = dict((c, i) for (i, c) in enumerate(base64chars))
def base64_decode_array(s):
# Filter out meaningless chars, especially newlines. No need for a regex.
s = ''.join(c for c in s if c in base64chars + '=')
# replace any incoming padding with a zero pad (the 'A' character is zero)
# Their way:
# p = ('AA' if s[-2] == '=' else 'A') if s[-1] == '=' else ''
# s = s[:len(s) - len(p)] + p
# My way (allows for more padding than that;
# '=' will only appear at the end anyway
s = s.replace('=', 'A')
r = []
# Iterate over the string in blocks of 4 chars - an ugly hack
# though we are preserving the original code's assumption that the text length
# is a multiple of 4 (that's what the '=' padding is for) ;)
for a, b, c, d in zip(*([iter(s)] * 4)):
# Translate each letter in the quad into a 6-bit value and bit-shift them
# together into a 24-bit value
n = (lookup[a] << 18) + (lookup[b] << 12) + (lookup[c] << 6) + lookup[d]
# split the 24-bit number into the original three 8-bit (ASCII) characters
r += [(n >> 16) & 0xFF), (n >> 8) & 0xFF), (n & 0xFF)]
return r
精彩评论