Base64 encoding and decoding with OpenSSL
I've been trying to figure out the openssl documentation for base64开发者_开发知识库 decoding and encoding. I found some code snippets below
#include <openssl/sha.h>
#include <openssl/hmac.h>
#include <openssl/evp.h>
#include <openssl/bio.h>
#include <openssl/buffer.h>
char *base64(const unsigned char *input, int length)
{
BIO *bmem, *b64;
BUF_MEM *bptr;
b64 = BIO_new(BIO_f_base64());
bmem = BIO_new(BIO_s_mem());
b64 = BIO_push(b64, bmem);
BIO_write(b64, input, length);
BIO_flush(b64);
BIO_get_mem_ptr(b64, &bptr);
char *buff = (char *)malloc(bptr->length);
memcpy(buff, bptr->data, bptr->length-1);
buff[bptr->length-1] = 0;
BIO_free_all(b64);
return buff;
}
char *decode64(unsigned char *input, int length)
{
BIO *b64, *bmem;
char *buffer = (char *)malloc(length);
memset(buffer, 0, length);
b64 = BIO_new(BIO_f_base64());
bmem = BIO_new_mem_buf(input, length);
bmem = BIO_push(b64, bmem);
BIO_read(bmem, buffer, length);
BIO_free_all(bmem);
return buffer;
}
This only seems to work for single line strings such as "Start", the moment I introduce complex strings with newlines and spaces etc it fails horribly.
It doesn't even have to be openssl, a simple class or set of functions that do the same thing would be fine, theres a very complicated build process for the solution and I am trying to avoid having to go in there and make multiple changes. The only reason I went for openssl is because the solution is already compiled with the libraries.
Personally, I find the OpenSSL API to be so incredibly painful to use, I avoid it unless the cost of avoiding it is extremely high. I find it quite upsetting that it has become the standard API in the crypto world.
I was feeling bored, and I wrote you one in C++. This one should even handle the edge cases that can cause security problems, like, for example, encoding a string that results in integer overflow because it's too large.
I have done some unit testing on it, so it should work.
#include <string>
#include <cassert>
#include <limits>
#include <stdexcept>
#include <cctype>
static const char b64_table[65] = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
static const char reverse_table[128] = {
64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64,
64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64,
64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 62, 64, 64, 64, 63,
52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 64, 64, 64, 64, 64, 64,
64, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 64, 64, 64, 64, 64,
64, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 64, 64, 64, 64, 64
};
::std::string base64_encode(const ::std::string &bindata)
{
using ::std::string;
using ::std::numeric_limits;
if (bindata.size() > (numeric_limits<string::size_type>::max() / 4u) * 3u) {
throw ::std::length_error("Converting too large a string to base64.");
}
const ::std::size_t binlen = bindata.size();
// Use = signs so the end is properly padded.
string retval((((binlen + 2) / 3) * 4), '=');
::std::size_t outpos = 0;
int bits_collected = 0;
unsigned int accumulator = 0;
const string::const_iterator binend = bindata.end();
for (string::const_iterator i = bindata.begin(); i != binend; ++i) {
accumulator = (accumulator << 8) | (*i & 0xffu);
bits_collected += 8;
while (bits_collected >= 6) {
bits_collected -= 6;
retval[outpos++] = b64_table[(accumulator >> bits_collected) & 0x3fu];
}
}
if (bits_collected > 0) { // Any trailing bits that are missing.
assert(bits_collected < 6);
accumulator <<= 6 - bits_collected;
retval[outpos++] = b64_table[accumulator & 0x3fu];
}
assert(outpos >= (retval.size() - 2));
assert(outpos <= retval.size());
return retval;
}
::std::string base64_decode(const ::std::string &ascdata)
{
using ::std::string;
string retval;
const string::const_iterator last = ascdata.end();
int bits_collected = 0;
unsigned int accumulator = 0;
for (string::const_iterator i = ascdata.begin(); i != last; ++i) {
const int c = *i;
if (::std::isspace(c) || c == '=') {
// Skip whitespace and padding. Be liberal in what you accept.
continue;
}
if ((c > 127) || (c < 0) || (reverse_table[c] > 63)) {
throw ::std::invalid_argument("This contains characters not legal in a base64 encoded string.");
}
accumulator = (accumulator << 6) | reverse_table[c];
bits_collected += 6;
if (bits_collected >= 8) {
bits_collected -= 8;
retval += static_cast<char>((accumulator >> bits_collected) & 0xffu);
}
}
return retval;
}
Rather than using the BIO_
interface it's much easier to use the EVP_
interface. For instance:
#include <iostream>
#include <stdlib.h>
#include <openssl/evp.h>
char *base64(const unsigned char *input, int length) {
const auto pl = 4*((length+2)/3);
auto output = reinterpret_cast<char *>(calloc(pl+1, 1)); //+1 for the terminating null that EVP_EncodeBlock adds on
const auto ol = EVP_EncodeBlock(reinterpret_cast<unsigned char *>(output), input, length);
if (pl != ol) { std::cerr << "Whoops, encode predicted " << pl << " but we got " << ol << "\n"; }
return output;
}
unsigned char *decode64(const char *input, int length) {
const auto pl = 3*length/4;
auto output = reinterpret_cast<unsigned char *>(calloc(pl+1, 1));
const auto ol = EVP_DecodeBlock(output, reinterpret_cast<const unsigned char *>(input), length);
if (pl != ol) { std::cerr << "Whoops, decode predicted " << pl << " but we got " << ol << "\n"; }
return output;
}
The EVP functions include a streaming interface too, see the man page.
Here is an example of OpenSSL base64 encode/decode I wrote:
Notice, I have some macros/classes in the code that I wrote, but none of them is important for the example. It is simply some C++ wrappers I wrote:
buffer base64::encode( const buffer& data )
{
// bio is simply a class that wraps BIO* and it free the BIO in the destructor.
bio b64(BIO_f_base64()); // create BIO to perform base64
BIO_set_flags(b64,BIO_FLAGS_BASE64_NO_NL);
bio mem(BIO_s_mem()); // create BIO that holds the result
// chain base64 with mem, so writing to b64 will encode base64 and write to mem.
BIO_push(b64, mem);
// write data
bool done = false;
int res = 0;
while(!done)
{
res = BIO_write(b64, data.data, (int)data.size);
if(res <= 0) // if failed
{
if(BIO_should_retry(b64)){
continue;
}
else // encoding failed
{
/* Handle Error!!! */
}
}
else // success!
done = true;
}
BIO_flush(b64);
// get a pointer to mem's data
char* dt;
long len = BIO_get_mem_data(mem, &dt);
// assign data to output
std::string s(dt, len);
return buffer(s.length()+sizeof(char), (byte*)s.c_str());
}
This works for me, and verified no memory leaks with valgrind.
#include <openssl/bio.h>
#include <openssl/evp.h>
#include <cstring>
#include <memory>
#include <string>
#include <vector>
#include <iostream>
namespace {
struct BIOFreeAll { void operator()(BIO* p) { BIO_free_all(p); } };
}
std::string Base64Encode(const std::vector<unsigned char>& binary)
{
std::unique_ptr<BIO,BIOFreeAll> b64(BIO_new(BIO_f_base64()));
BIO_set_flags(b64.get(), BIO_FLAGS_BASE64_NO_NL);
BIO* sink = BIO_new(BIO_s_mem());
BIO_push(b64.get(), sink);
BIO_write(b64.get(), binary.data(), binary.size());
BIO_flush(b64.get());
const char* encoded;
const long len = BIO_get_mem_data(sink, &encoded);
return std::string(encoded, len);
}
// Assumes no newlines or extra characters in encoded string
std::vector<unsigned char> Base64Decode(const char* encoded)
{
std::unique_ptr<BIO,BIOFreeAll> b64(BIO_new(BIO_f_base64()));
BIO_set_flags(b64.get(), BIO_FLAGS_BASE64_NO_NL);
BIO* source = BIO_new_mem_buf(encoded, -1); // read-only source
BIO_push(b64.get(), source);
const int maxlen = strlen(encoded) / 4 * 3 + 1;
std::vector<unsigned char> decoded(maxlen);
const int len = BIO_read(b64.get(), decoded.data(), maxlen);
decoded.resize(len);
return decoded;
}
int main()
{
const char* msg = "hello";
const std::vector<unsigned char> binary(msg, msg+strlen(msg));
const std::string encoded = Base64Encode(binary);
std::cout << "encoded = " << encoded << std::endl;
const std::vector<unsigned char> decoded = Base64Decode(encoded.c_str());
std::cout << "decoded = ";
for (unsigned char c : decoded) std::cout << c;
std::cout << std::endl;
return 0;
}
Compile:
g++ -lcrypto main.cc
Output:
encoded = aGVsbG8=
decoded = hello
So many horrible C
code examples with buffers and malloc()
, what about using std::string
properly on this C++
tagged question?
#include <openssl/bio.h>
#include <openssl/evp.h>
#include <openssl/buffer.h>
#include <string>
std::string base64_encode(const std::string& input)
{
const auto base64_memory = BIO_new(BIO_s_mem());
auto base64 = BIO_new(BIO_f_base64());
base64 = BIO_push(base64, base64_memory);
BIO_write(base64, input.c_str(), static_cast<int>(input.length()));
BIO_flush(base64);
BUF_MEM* buffer_memory{};
BIO_get_mem_ptr(base64, &buffer_memory);
auto base64_encoded = std::string(buffer_memory->data, buffer_memory->length - 1);
BIO_free_all(base64);
return base64_encoded;
}
I like mtrw's use of EVP.
Below is my "modern C++" take on his answer without manual memory allocation (calloc
). It will take a std::string
but it can easily be overloaded to use raw bytes for example.
#include <openssl/evp.h>
#include <memory>
#include <stdexcept>
#include <vector>
auto EncodeBase64(const std::string& to_encode) -> std::string {
/// @sa https://www.openssl.org/docs/manmaster/man3/EVP_EncodeBlock.html
const auto predicted_len = 4 * ((to_encode.length() + 2) / 3); // predict output size
const auto output_buffer{std::make_unique<char[]>(predicted_len + 1)};
const std::vector<unsigned char> vec_chars{to_encode.begin(), to_encode.end()}; // convert to_encode into uchar container
const auto output_len = EVP_EncodeBlock(reinterpret_cast<unsigned char*>(output_buffer.get()), vec_chars.data(), static_cast<int>(vec_chars.size()));
if (predicted_len != static_cast<unsigned long>(output_len)) {
throw std::runtime_error("EncodeBase64 error");
}
return output_buffer.get();
}
auto DecodeBase64(const std::string& to_decode) -> std::string {
/// @sa https://www.openssl.org/docs/manmaster/man3/EVP_DecodeBlock.html
const auto predicted_len = 3 * to_decode.length() / 4; // predict output size
const auto output_buffer{std::make_unique<char[]>(predicted_len + 1)};
const std::vector<unsigned char> vec_chars{to_decode.begin(), to_decode.end()}; // convert to_decode into uchar container
const auto output_len = EVP_DecodeBlock(reinterpret_cast<unsigned char*>(output_buffer.get()), vec_chars.data(), static_cast<int>(vec_chars.size()));
if (predicted_len != static_cast<unsigned long>(output_len)) {
throw std::runtime_error("DecodeBase64 error");
}
return output_buffer.get();
}
There's probably a cleaner/better way of doing this (I'd like to get rid of reinterpret_cast
). You'll also definitely want a try/catch
block to deal with the potential exception.
Improved TCS answer to remove macros/datastructures
unsigned char *encodeb64mem( unsigned char *data, int len, int *lenoutput )
{
// bio is simply a class that wraps BIO* and it free the BIO in the destructor.
BIO *b64 = BIO_new(BIO_f_base64()); // create BIO to perform base64
BIO_set_flags(b64, BIO_FLAGS_BASE64_NO_NL);
BIO *mem = BIO_new(BIO_s_mem()); // create BIO that holds the result
// chain base64 with mem, so writing to b64 will encode base64 and write to mem.
BIO_push(b64, mem);
// write data
bool done = false;
int res = 0;
while(!done)
{
res = BIO_write(b64, data, len);
if(res <= 0) // if failed
{
if(BIO_should_retry(b64)){
continue;
}
else // encoding failed
{
/* Handle Error!!! */
}
}
else // success!
done = true;
}
BIO_flush(b64);
// get a pointer to mem's data
unsigned char* output;
*lenoutput = BIO_get_mem_data(mem, &output);
// assign data to output
//std::string s(dt, len2);
return output;
}
To write to file
int encodeb64(unsigned char* input, const char* filenm, int leni)
{
BIO *b64 = BIO_new(BIO_f_base64());
BIO_set_flags(b64,BIO_FLAGS_BASE64_NO_NL);
BIO *file = BIO_new_file(filenm, "w");
BIO *mem = BIO_new(BIO_f_buffer());
BIO_push(b64, mem);
BIO_push(mem, file);
// write data
bool done = false;
int res = 0;
while(!done)
{
res = BIO_write(b64, input, leni);
if(res <= 0) // if failed
{
if(BIO_should_retry(b64)){
continue;
}
else // encoding failed
{
/* Handle Error!!! */
}
}
else // success!
done = true;
}
BIO_flush(b64);
BIO_pop(b64);
BIO_free_all(b64);
return 0;
}
Base64 encoding from file to file. Many times due to file constraint we have read in chunks of data and do encoding. Below is the code.
int encodeb64FromFile(const char* input, const char* outputfilename)
{
BIO *b64 = BIO_new(BIO_f_base64());
BIO_set_flags(b64,BIO_FLAGS_BASE64_NO_NL);
int leni = 3*64;
unsigned char *data[3*64];
BIO *file = BIO_new_file(outputfilename, "w");
BIO *mem = BIO_new(BIO_f_buffer());
BIO_push(b64, mem);
BIO_push(mem, file);
FILE *fp = fopen(input, "rb");
while ((leni = fread(data, 1, sizeof data, fp)) > 0) {
// write data
bool done = false;
int res = 0;
while(!done)
{
res = BIO_write(b64, data, leni);
if(res <= 0) // if failed
{
if(BIO_should_retry(b64)){
continue;
}
else // encoding failed
{
/* Handle Error!!! */
}
}
else // success!
done = true;
}
}
BIO_flush(b64);
BIO_pop(b64);
BIO_free_all(b64);
fclose(fp);
return 0;
}
#include <openssl/bio.h>
typedef unsigned char byte;
namespace base64 {
static void Encode(const byte* in, size_t in_len,
char** out, size_t* out_len) {
BIO *buff, *b64f;
BUF_MEM *ptr;
b64f = BIO_new(BIO_f_base64());
buff = BIO_new(BIO_s_mem());
buff = BIO_push(b64f, buff);
BIO_set_flags(buff, BIO_FLAGS_BASE64_NO_NL);
BIO_set_close(buff, BIO_CLOSE);
BIO_write(buff, in, in_len);
BIO_flush(buff);
BIO_get_mem_ptr(buff, &ptr);
(*out_len) = ptr->length;
(*out) = (char *) malloc(((*out_len) + 1) * sizeof(char));
memcpy(*out, ptr->data, (*out_len));
(*out)[(*out_len)] = '\0';
BIO_free_all(buff);
}
static void Decode(const char* in, size_t in_len,
byte** out, size_t* out_len) {
BIO *buff, *b64f;
b64f = BIO_new(BIO_f_base64());
buff = BIO_new_mem_buf((void *)in, in_len);
buff = BIO_push(b64f, buff);
(*out) = (byte *) malloc(in_len * sizeof(char));
BIO_set_flags(buff, BIO_FLAGS_BASE64_NO_NL);
BIO_set_close(buff, BIO_CLOSE);
(*out_len) = BIO_read(buff, (*out), in_len);
(*out) = (byte *) realloc((void *)(*out), ((*out_len) + 1) * sizeof(byte));
(*out)[(*out_len)] = '\0';
BIO_free_all(buff);
}
}
Base64 is really pretty simple; you shouldn't have trouble finding any number of implementations via a quick Google. For example here is a reference implementation in C from the Internet Software Consortium, with detailed comments explaining the process.
The openssl implementation layers a lot of complexity with the "BIO" stuff that's not (IMHO) very useful if all you're doing is decoding/encoding.
Late to the party, but I came across this problem recently myself, but was unhappy with both the BIO solution, which is unnecessarily convoluted, but did not like 'EncodeBlock' either, because it introduces newline characters I do not want in my Base64 encoded string.
After a little sniffing, I came across the header file openssl/include/crypto/evp.h
which is not part of the default installation (which only exports the include/openssl folder for me), but exports the solution to the problem.
void evp_encode_ctx_set_flags(EVP_ENCODE_CTX *ctx, unsigned int flags);
/* EVP_ENCODE_CTX flags */
/* Don't generate new lines when encoding */
#define EVP_ENCODE_CTX_NO_NEWLINES 1
/* Use the SRP base64 alphabet instead of the standard one */
#define EVP_ENCODE_CTX_USE_SRP_ALPHABET 2
Using this function, the 'no newline' becomes possible using the EVP interface.
Example:
if (EVP_ENCODE_CTX *context = EVP_ENCODE_CTX_new())
{
EVP_EncodeInit(context);
evp_encode_ctx_set_flags(context, EVP_ENCODE_CTX_NO_NEWLINES);
while (hasData())
{
uint8_t *data;
int32_t length = fetchData(&data);
int32_t size = (((EVP_ENCODE_CTX_num(context) + length)/48) * 65) + 1;
uint8_t buffer[size];
EVP_EncodeUpdate(context, buffer, &size, pData, length);
//process encoded data.
}
uint8_t buffer[65];
int32_t writtenBytes;
EVP_EncodeFinal(context, buffer, &writtenBytes);
//Do something with the final remainder of the encoded string.
EVP_ENCODE_CTX_free(context);
}
This piece of code will encode the buffer to Base64 without the newlines.
Please note the use of EVP_ENCODE_CTX_num
to obtain the 'leftover bytes' still stored in the context object to calculate the correct buffer size.
It is only necessary, if you need to call EVP_EncodeUpdate
multiple times, because your data is exceedingly large or not available at once.
精彩评论