开发者

Escaping a C++ string

What's the easiest way to convert a C++ std::string to another std::string, which has all the unprintable characters escaped?

For example, for the string of two characters [0x61,开发者_运维问答0x01], the result string might be "a\x01" or "a%01".


Take a look at the Boost's String Algorithm Library. You can use its is_print classifier (together with its operator! overload) to pick out nonprintable characters, and its find_format() functions can replace those with whatever formatting you wish.

#include <iostream>
#include <boost/format.hpp>
#include <boost/algorithm/string.hpp>

struct character_escaper
{
    template<typename FindResultT>
    std::string operator()(const FindResultT& Match) const
    {
        std::string s;
        for (typename FindResultT::const_iterator i = Match.begin();
             i != Match.end();
             i++) {
            s += str(boost::format("\\x%02x") % static_cast<int>(*i));
        }
        return s;
    }
};

int main (int argc, char **argv)
{
    std::string s("a\x01");
    boost::find_format_all(s, boost::token_finder(!boost::is_print()), character_escaper());
    std::cout << s << std::endl;
    return 0;
}


Assumes the execution character set is a superset of ASCII and CHAR_BIT is 8. For the OutIter pass a back_inserter (e.g. to a vector<char> or another string), ostream_iterator, or any other suitable output iterator.

template<class OutIter>
OutIter write_escaped(std::string const& s, OutIter out) {
  *out++ = '"';
  for (std::string::const_iterator i = s.begin(), end = s.end(); i != end; ++i) {
    unsigned char c = *i;
    if (' ' <= c and c <= '~' and c != '\\' and c != '"') {
      *out++ = c;
    }
    else {
      *out++ = '\\';
      switch(c) {
      case '"':  *out++ = '"';  break;
      case '\\': *out++ = '\\'; break;
      case '\t': *out++ = 't';  break;
      case '\r': *out++ = 'r';  break;
      case '\n': *out++ = 'n';  break;
      default:
        char const* const hexdig = "0123456789ABCDEF";
        *out++ = 'x';
        *out++ = hexdig[c >> 4];
        *out++ = hexdig[c & 0xF];
      }
    }
  }
  *out++ = '"';
  return out;
}


Assuming that "easiest way" means short and yet easily understandable while not depending on any other resources (like libs) I would go this way:

#include <cctype>
#include <sstream>

// s is our escaped output string
std::string s = "";
// loop through all characters
for(char c : your_string)
{
    // check if a given character is printable
    // the cast is necessary to avoid undefined behaviour
    if(isprint((unsigned char)c))
        s += c;
    else
    {
        std::stringstream stream;
        // if the character is not printable
        // we'll convert it to a hex string using a stringstream
        // note that since char is signed we have to cast it to unsigned first
        stream << std::hex << (unsigned int)(unsigned char)(c);
        std::string code = stream.str();
        s += std::string("\\x")+(code.size()<2?"0":"")+code;
        // alternatively for URL encodings:
        //s += std::string("%")+(code.size()<2?"0":"")+code;
    }
}


One person's unprintable character is another's multi-byte character. So you'll have to define the encoding before you can work out what bytes map to what characters, and which of those is unprintable.


Have you seen the article about how to Generate Escaped String Output Using Spirit.Karma?

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜