开发者

Calculate the size to a Base 64 encoded message

I have a binary string that I am encoding in Base 64. Now, I need to know before hand the size of the final Base 64 encoded string will be.

Is there any way to calculate that?

Something like:

BinaryStringSize is 64Kb EncodedBinaryStringSize will be 127Kb after encoding.

Oh, the开发者_StackOverflow中文版 code is in C.

Thanks.


If you do Base64 exactly right, and that includes padding the end with = characters, and you break it up with a CR LF every 72 characters, the answer can be found with:

code_size    = ((input_size * 4) / 3);
padding_size = (input_size % 3) ? (3 - (input_size % 3)) : 0;
crlfs_size   = 2 + (2 * (code_size + padding_size) / 72);
total_size   = code_size + padding_size + crlfs_size;

In C, you may also terminate with a \0-byte, so there'll be an extra byte there, and you may want to length-check at the end of every code as you write them, so if you're just looking for what you pass to malloc(), you might actually prefer a version that wastes a few bytes, in order to make the coding simpler:

output_size = ((input_size * 4) / 3) + (input_size / 96) + 6;


geocar's answer was close, but could sometimes be off slightly.

There are 4 bytes output for every 3 bytes of input. If the input size is not a multiple of three, we must add to make it one. Otherwise leave it alone.

input_size + ( (input_size % 3) ? (3 - (input_size % 3)) : 0) 

Divide this by 3, then multiply by 4. That is our total output size, including padding.

code_padded_size = ((input_size + ( (input_size % 3) ? (3 - (input_size % 3)) : 0) ) / 3) * 4

As I said in my comment, the total size must be divided by the line width before doubling to properly account for the last line. Otherwise the number of CRLF characters will be overestimated. I am also assuming there will only be a CRLF pair if the line is 72 characters. This includes the last line, but not if it is under 72 characters.

newline_size = ((code_padded_size) / 72) * 2

So put it all together:

unsigned int code_padded_size = ((input_size + ( (input_size % 3) ? (3 - (input_size % 3)) : 0) ) / 3) * 4;
unsigned int newline_size = ((code_padded_size) / 72) * 2;

unsigned int total_size = code_padded_size + newline_size;

Or to make it a bit more readable:

unsigned int adjustment = ( (input_size % 3) ? (3 - (input_size % 3)) : 0);
unsigned int code_padded_size = ( (input_size + adjustment) / 3) * 4;
unsigned int newline_size = ((code_padded_size) / 72) * 2;

unsigned int total_size = code_padded_size + newline_size;


Here is a simple C implementation (without modulus and trinary operators) for raw base64 encoded size (with standard '=' padding):

int output_size;
output_size = ((input_size - 1) / 3) * 4 + 4;

To that you will need to add any additional overhead for CRLF if required. The standard base64 encoding (RFC 3548 or RFC 4648) allows CRLF line breaks (at either 64 or 76 characters) but does not require it. The MIME variant (RFC 2045) requires line breaks after every 76 characters.

For example, the total encoded length using 76 character lines building on the above:

int final_size;
final_size = output_size + (output_size / 76) * 2;

See the base64 wikipedia entry for more variants.


Check out the b64 library. The function b64_encode2() can give a maximum estimate of the required size if you pass NULL, so you can allocate memory with certainty, and then call again passing the buffer and have it do the conversion.


I ran into a similar situation in python, and using codecs.iterencode(text, "base64") the correct calculation was:

adjustment = 3 - (input_size % 3) if (input_size % 3) else 0
code_padded_size = ( (input_size + adjustment) / 3) * 4
newline_size = ((code_padded_size) / 76) * 1
return code_padded_size + newline_size


Base 64 transforms 3 bytes into 4.

If you're set of bits does not happen to be a multiple of 24 bits, you must pad it out so that it has a multiple of 24 bits (3 bytes).


I think this formula should work:

b64len = (size * 8 + 5) / 6


 if (inputSize == 0) return 0;

 int size = ((inputSize - 1) / 3) * 4 + 4;
 int nlines = (size - 1)/ maxLine + 1;
 return size + nlines * 2;

This formula adds a terminating CRLF (MIME, rfc2045) if and only if the last line does not fit exactly in max line length.


The actual length of MIME-compliant base64-encoded binary data is usually about 137% of the original data length, though for very short messages the overhead can be a lot higher because of the overhead of the headers. Very roughly, the final size of base64-encoded binary data is equal to 1.37 times the original data size + 814 bytes (for headers).

In other words, you can approximate the size of the decoded data with this formula:

BytesNeededForEncoding = (string_length(base_string) * 1.37) + 814;
BytesNeededForDecoding = (string_length(encoded_string) - 814) / 1.37;

Source: http://en.wikipedia.org/wiki/Base64

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜