Base64 encoding and decoding with OpenSSL

2023-02-16 07:51 问答作者：

I've been trying to figure out the openssl documentation for base64开发者_开发知识库 decoding and encoding. I found some code snippets below

#include <openssl/sha.h>
#include <openssl/hmac.h>
#include <openssl/evp.h>
#include <openssl/bio.h>
#include <openssl/buffer.h>

char *base64(const unsigned char *input, int length)
{
  BIO *bmem, *b64;
  BUF_MEM *bptr;

  b64 = BIO_new(BIO_f_base64());
  bmem = BIO_new(BIO_s_mem());
  b64 = BIO_push(b64, bmem);
  BIO_write(b64, input, length);
  BIO_flush(b64);
  BIO_get_mem_ptr(b64, &bptr);

  char *buff = (char *)malloc(bptr->length);
  memcpy(buff, bptr->data, bptr->length-1);
  buff[bptr->length-1] = 0;

  BIO_free_all(b64);

  return buff;
}

char *decode64(unsigned char *input, int length)
{
  BIO *b64, *bmem;

  char *buffer = (char *)malloc(length);
  memset(buffer, 0, length);

  b64 = BIO_new(BIO_f_base64());
  bmem = BIO_new_mem_buf(input, length);
  bmem = BIO_push(b64, bmem);

  BIO_read(bmem, buffer, length);

  BIO_free_all(bmem);

  return buffer;
}

This only seems to work for single line strings such as "Start", the moment I introduce complex strings with newlines and spaces etc it fails horribly.

It doesn't even have to be openssl, a simple class or set of functions that do the same thing would be fine, theres a very complicated build process for the solution and I am trying to avoid having to go in there and make multiple changes. The only reason I went for openssl is because the solution is already compiled with the libraries.

Personally, I find the OpenSSL API to be so incredibly painful to use, I avoid it unless the cost of avoiding it is extremely high. I find it quite upsetting that it has become the standard API in the crypto world.

I was feeling bored, and I wrote you one in C++. This one should even handle the edge cases that can cause security problems, like, for example, encoding a string that results in integer overflow because it's too large.

I have done some unit testing on it, so it should work.

#include <string>
#include <cassert>
#include <limits>
#include <stdexcept>
#include <cctype>

static const char b64_table[65] = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";

static const char reverse_table[128] = {
   64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64,
   64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64,
   64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 62, 64, 64, 64, 63,
   52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 64, 64, 64, 64, 64, 64,
   64,  0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14,
   15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 64, 64, 64, 64, 64,
   64, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
   41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 64, 64, 64, 64, 64
};

::std::string base64_encode(const ::std::string &bindata)
{
   using ::std::string;
   using ::std::numeric_limits;

   if (bindata.size() > (numeric_limits<string::size_type>::max() / 4u) * 3u) {
      throw ::std::length_error("Converting too large a string to base64.");
   }

   const ::std::size_t binlen = bindata.size();
   // Use = signs so the end is properly padded.
   string retval((((binlen + 2) / 3) * 4), '=');
   ::std::size_t outpos = 0;
   int bits_collected = 0;
   unsigned int accumulator = 0;
   const string::const_iterator binend = bindata.end();

   for (string::const_iterator i = bindata.begin(); i != binend; ++i) {
      accumulator = (accumulator << 8) | (*i & 0xffu);
      bits_collected += 8;
      while (bits_collected >= 6) {
         bits_collected -= 6;
         retval[outpos++] = b64_table[(accumulator >> bits_collected) & 0x3fu];
      }
   }
   if (bits_collected > 0) { // Any trailing bits that are missing.
      assert(bits_collected < 6);
      accumulator <<= 6 - bits_collected;
      retval[outpos++] = b64_table[accumulator & 0x3fu];
   }
   assert(outpos >= (retval.size() - 2));
   assert(outpos <= retval.size());
   return retval;
}

::std::string base64_decode(const ::std::string &ascdata)
{
   using ::std::string;
   string retval;
   const string::const_iterator last = ascdata.end();
   int bits_collected = 0;
   unsigned int accumulator = 0;

   for (string::const_iterator i = ascdata.begin(); i != last; ++i) {
      const int c = *i;
      if (::std::isspace(c) || c == '=') {
         // Skip whitespace and padding. Be liberal in what you accept.
         continue;
      }
      if ((c > 127) || (c < 0) || (reverse_table[c] > 63)) {
         throw ::std::invalid_argument("This contains characters not legal in a base64 encoded string.");
      }
      accumulator = (accumulator << 6) | reverse_table[c];
      bits_collected += 6;
      if (bits_collected >= 8) {
         bits_collected -= 8;
         retval += static_cast<char>((accumulator >> bits_collected) & 0xffu);
      }
   }
   return retval;
}

Rather than using the BIO_ interface it's much easier to use the EVP_ interface. For instance:

#include <iostream>
#include <stdlib.h>
#include <openssl/evp.h>

char *base64(const unsigned char *input, int length) {
  const auto pl = 4*((length+2)/3);
  auto output = reinterpret_cast<char *>(calloc(pl+1, 1)); //+1 for the terminating null that EVP_EncodeBlock adds on
  const auto ol = EVP_EncodeBlock(reinterpret_cast<unsigned char *>(output), input, length);
  if (pl != ol) { std::cerr << "Whoops, encode predicted " << pl << " but we got " << ol << "\n"; }
  return output;
}

unsigned char *decode64(const char *input, int length) {
  const auto pl = 3*length/4;
  auto output = reinterpret_cast<unsigned char *>(calloc(pl+1, 1));
  const auto ol = EVP_DecodeBlock(output, reinterpret_cast<const unsigned char *>(input), length);
  if (pl != ol) { std::cerr << "Whoops, decode predicted " << pl << " but we got " << ol << "\n"; }
  return output;
}

The EVP functions include a streaming interface too, see the man page.

Here is an example of OpenSSL base64 encode/decode I wrote:

Notice, I have some macros/classes in the code that I wrote, but none of them is important for the example. It is simply some C++ wrappers I wrote:

buffer base64::encode( const buffer& data )
{
    // bio is simply a class that wraps BIO* and it free the BIO in the destructor.

    bio b64(BIO_f_base64()); // create BIO to perform base64
    BIO_set_flags(b64,BIO_FLAGS_BASE64_NO_NL);

    bio mem(BIO_s_mem()); // create BIO that holds the result

    // chain base64 with mem, so writing to b64 will encode base64 and write to mem.
    BIO_push(b64, mem);

    // write data
    bool done = false;
    int res = 0;
    while(!done)
    {
        res = BIO_write(b64, data.data, (int)data.size);

        if(res <= 0) // if failed
        {
            if(BIO_should_retry(b64)){
                continue;
            }
            else // encoding failed
            {
                /* Handle Error!!! */
            }
        }
        else // success!
            done = true;
    }

    BIO_flush(b64);

    // get a pointer to mem's data
    char* dt;
    long len = BIO_get_mem_data(mem, &dt);

    // assign data to output
    std::string s(dt, len);

    return buffer(s.length()+sizeof(char), (byte*)s.c_str());
}

This works for me, and verified no memory leaks with valgrind.

#include <openssl/bio.h>
#include <openssl/evp.h>
#include <cstring>
#include <memory>
#include <string>
#include <vector>

#include <iostream>

namespace {
struct BIOFreeAll { void operator()(BIO* p) { BIO_free_all(p); } };
}

std::string Base64Encode(const std::vector<unsigned char>& binary)
{
    std::unique_ptr<BIO,BIOFreeAll> b64(BIO_new(BIO_f_base64()));
    BIO_set_flags(b64.get(), BIO_FLAGS_BASE64_NO_NL);
    BIO* sink = BIO_new(BIO_s_mem());
    BIO_push(b64.get(), sink);
    BIO_write(b64.get(), binary.data(), binary.size());
    BIO_flush(b64.get());
    const char* encoded;
    const long len = BIO_get_mem_data(sink, &encoded);
    return std::string(encoded, len);
}

// Assumes no newlines or extra characters in encoded string
std::vector<unsigned char> Base64Decode(const char* encoded)
{
    std::unique_ptr<BIO,BIOFreeAll> b64(BIO_new(BIO_f_base64()));
    BIO_set_flags(b64.get(), BIO_FLAGS_BASE64_NO_NL);
    BIO* source = BIO_new_mem_buf(encoded, -1); // read-only source
    BIO_push(b64.get(), source);
    const int maxlen = strlen(encoded) / 4 * 3 + 1;
    std::vector<unsigned char> decoded(maxlen);
    const int len = BIO_read(b64.get(), decoded.data(), maxlen);
    decoded.resize(len);
    return decoded;
}

int main()
{
    const char* msg = "hello";
    const std::vector<unsigned char> binary(msg, msg+strlen(msg));
    const std::string encoded = Base64Encode(binary);
    std::cout << "encoded = " << encoded << std::endl;
    const std::vector<unsigned char> decoded = Base64Decode(encoded.c_str());
    std::cout << "decoded = ";
    for (unsigned char c : decoded) std::cout << c;
    std::cout << std::endl;
    return 0;
}

Compile:

g++ -lcrypto main.cc

Output:

encoded = aGVsbG8=
decoded = hello

So many horrible C code examples with buffers and malloc(), what about using std::string properly on this C++ tagged question?

#include <openssl/bio.h>
#include <openssl/evp.h>
#include <openssl/buffer.h>
#include <string>

std::string base64_encode(const std::string& input)
{
    const auto base64_memory = BIO_new(BIO_s_mem());
    auto base64 = BIO_new(BIO_f_base64());
    base64 = BIO_push(base64, base64_memory);
    BIO_write(base64, input.c_str(), static_cast<int>(input.length()));
    BIO_flush(base64);
    BUF_MEM* buffer_memory{};
    BIO_get_mem_ptr(base64, &buffer_memory);
    auto base64_encoded = std::string(buffer_memory->data, buffer_memory->length - 1);
    BIO_free_all(base64);
    return base64_encoded;
}

I like mtrw's use of EVP.

Below is my "modern C++" take on his answer without manual memory allocation (calloc). It will take a std::string but it can easily be overloaded to use raw bytes for example.

#include <openssl/evp.h>

#include <memory>
#include <stdexcept>
#include <vector>


auto EncodeBase64(const std::string& to_encode) -> std::string {
  /// @sa https://www.openssl.org/docs/manmaster/man3/EVP_EncodeBlock.html

  const auto predicted_len = 4 * ((to_encode.length() + 2) / 3);  // predict output size

  const auto output_buffer{std::make_unique<char[]>(predicted_len + 1)};

  const std::vector<unsigned char> vec_chars{to_encode.begin(), to_encode.end()};  // convert to_encode into uchar container

  const auto output_len = EVP_EncodeBlock(reinterpret_cast<unsigned char*>(output_buffer.get()), vec_chars.data(), static_cast<int>(vec_chars.size()));

  if (predicted_len != static_cast<unsigned long>(output_len)) {
    throw std::runtime_error("EncodeBase64 error");
  }

  return output_buffer.get();
}

auto DecodeBase64(const std::string& to_decode) -> std::string {
  /// @sa https://www.openssl.org/docs/manmaster/man3/EVP_DecodeBlock.html

  const auto predicted_len = 3 * to_decode.length() / 4;  // predict output size

  const auto output_buffer{std::make_unique<char[]>(predicted_len + 1)};

  const std::vector<unsigned char> vec_chars{to_decode.begin(), to_decode.end()};  // convert to_decode into uchar container

  const auto output_len = EVP_DecodeBlock(reinterpret_cast<unsigned char*>(output_buffer.get()), vec_chars.data(), static_cast<int>(vec_chars.size()));

  if (predicted_len != static_cast<unsigned long>(output_len)) {
    throw std::runtime_error("DecodeBase64 error");
  }

  return output_buffer.get();
}

There's probably a cleaner/better way of doing this (I'd like to get rid of reinterpret_cast). You'll also definitely want a try/catch block to deal with the potential exception.

Improved TCS answer to remove macros/datastructures

unsigned char *encodeb64mem( unsigned char *data, int len, int *lenoutput )
{
// bio is simply a class that wraps BIO* and it free the BIO in the destructor.

BIO *b64 = BIO_new(BIO_f_base64()); // create BIO to perform base64
BIO_set_flags(b64, BIO_FLAGS_BASE64_NO_NL);

BIO *mem = BIO_new(BIO_s_mem()); // create BIO that holds the result

// chain base64 with mem, so writing to b64 will encode base64 and write to mem.
BIO_push(b64, mem);

// write data
bool done = false;
int res = 0;
while(!done)
{
    res = BIO_write(b64, data, len);

    if(res <= 0) // if failed
    {
        if(BIO_should_retry(b64)){
            continue;
        }
        else // encoding failed
        {
            /* Handle Error!!! */
        }
    }
    else // success!
        done = true;
}

BIO_flush(b64);

// get a pointer to mem's data
unsigned char* output;
*lenoutput = BIO_get_mem_data(mem, &output);

// assign data to output
//std::string s(dt, len2);

return output;
}

To write to file

int encodeb64(unsigned char* input, const char* filenm, int leni)
{
BIO *b64 = BIO_new(BIO_f_base64());
BIO_set_flags(b64,BIO_FLAGS_BASE64_NO_NL);

BIO *file = BIO_new_file(filenm, "w");
BIO *mem = BIO_new(BIO_f_buffer());
BIO_push(b64, mem);
BIO_push(mem, file);

// write data
bool done = false;
int res = 0;
while(!done)
{
    res = BIO_write(b64, input, leni);

    if(res <= 0) // if failed
    {
        if(BIO_should_retry(b64)){
            continue;
        }
        else // encoding failed
        {
            /* Handle Error!!! */
        }
    }
    else // success!
        done = true;
}

BIO_flush(b64);
BIO_pop(b64);
BIO_free_all(b64);

    return 0;
}

Base64 encoding from file to file. Many times due to file constraint we have read in chunks of data and do encoding. Below is the code.

int encodeb64FromFile(const char* input, const char* outputfilename)
{
BIO *b64 = BIO_new(BIO_f_base64());
BIO_set_flags(b64,BIO_FLAGS_BASE64_NO_NL);
int leni = 3*64;
unsigned char *data[3*64];
BIO *file = BIO_new_file(outputfilename, "w");
BIO *mem = BIO_new(BIO_f_buffer());
BIO_push(b64, mem);
BIO_push(mem, file);

FILE *fp = fopen(input, "rb");
while ((leni = fread(data, 1, sizeof data, fp)) > 0) {
    // write data
    bool done = false;
    int res = 0;
    while(!done)
    {
        res = BIO_write(b64, data, leni);

        if(res <= 0) // if failed
        {
            if(BIO_should_retry(b64)){
                continue;
            }
            else // encoding failed
            {
                /* Handle Error!!! */
            }
        }
        else // success!
            done = true;
    }

 }

 BIO_flush(b64);
BIO_pop(b64);
BIO_free_all(b64);
fclose(fp);

return 0;
 }

  #include <openssl/bio.h>

  typedef unsigned char byte;      

  namespace base64 {
    static void Encode(const byte* in, size_t in_len,
                       char** out, size_t* out_len) {
      BIO *buff, *b64f;
      BUF_MEM *ptr;

      b64f = BIO_new(BIO_f_base64());
      buff = BIO_new(BIO_s_mem());
      buff = BIO_push(b64f, buff);

      BIO_set_flags(buff, BIO_FLAGS_BASE64_NO_NL);
      BIO_set_close(buff, BIO_CLOSE);
      BIO_write(buff, in, in_len);
      BIO_flush(buff);

      BIO_get_mem_ptr(buff, &ptr);
      (*out_len) = ptr->length;
      (*out) = (char *) malloc(((*out_len) + 1) * sizeof(char));
      memcpy(*out, ptr->data, (*out_len));
      (*out)[(*out_len)] = '\0';

      BIO_free_all(buff);
    }

    static void Decode(const char* in, size_t in_len,
                       byte** out, size_t* out_len) {
      BIO *buff, *b64f;

      b64f = BIO_new(BIO_f_base64());
      buff = BIO_new_mem_buf((void *)in, in_len);
      buff = BIO_push(b64f, buff);
      (*out) = (byte *) malloc(in_len * sizeof(char));

      BIO_set_flags(buff, BIO_FLAGS_BASE64_NO_NL);
      BIO_set_close(buff, BIO_CLOSE);
      (*out_len) = BIO_read(buff, (*out), in_len);
      (*out) = (byte *) realloc((void *)(*out), ((*out_len) + 1) * sizeof(byte));
      (*out)[(*out_len)] = '\0';

      BIO_free_all(buff);
    }
  }

Base64 is really pretty simple; you shouldn't have trouble finding any number of implementations via a quick Google. For example here is a reference implementation in C from the Internet Software Consortium, with detailed comments explaining the process.

The openssl implementation layers a lot of complexity with the "BIO" stuff that's not (IMHO) very useful if all you're doing is decoding/encoding.

Late to the party, but I came across this problem recently myself, but was unhappy with both the BIO solution, which is unnecessarily convoluted, but did not like 'EncodeBlock' either, because it introduces newline characters I do not want in my Base64 encoded string.

After a little sniffing, I came across the header file openssl/include/crypto/evp.h which is not part of the default installation (which only exports the include/openssl folder for me), but exports the solution to the problem.

void evp_encode_ctx_set_flags(EVP_ENCODE_CTX *ctx, unsigned int flags);

/* EVP_ENCODE_CTX flags */
/* Don't generate new lines when encoding */
#define EVP_ENCODE_CTX_NO_NEWLINES          1
/* Use the SRP base64 alphabet instead of the standard one */
#define EVP_ENCODE_CTX_USE_SRP_ALPHABET     2

Using this function, the 'no newline' becomes possible using the EVP interface.

Example:

if (EVP_ENCODE_CTX *context = EVP_ENCODE_CTX_new())
{
    EVP_EncodeInit(context);
    evp_encode_ctx_set_flags(context, EVP_ENCODE_CTX_NO_NEWLINES);
    while (hasData())
    {
        uint8_t *data;
        int32_t length = fetchData(&data);
        int32_t size = (((EVP_ENCODE_CTX_num(context) + length)/48) * 65) + 1;
        uint8_t buffer[size];
        EVP_EncodeUpdate(context, buffer, &size, pData, length);
        //process encoded data.
    }
    uint8_t buffer[65];
    int32_t writtenBytes;
    EVP_EncodeFinal(context, buffer, &writtenBytes);
    //Do something with the final remainder of the encoded string.
    EVP_ENCODE_CTX_free(context);
}

This piece of code will encode the buffer to Base64 without the newlines. Please note the use of EVP_ENCODE_CTX_num to obtain the 'leftover bytes' still stored in the context object to calculate the correct buffer size.

It is only necessary, if you need to call EVP_EncodeUpdate multiple times, because your data is exceedingly large or not available at once.

继续阅读：base64 openssl

Base64 encoding and decoding with OpenSSL

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？