开发者

Converting special characters (like \n) to their escaped versions

How to convert for instance "A\r\nB\tC\nD" to "A\\r\\nB\\tC\\nD" in C(++)?

Ideally using standard开发者_StackOverflow library only and a bonus upvote for both pure C and pure C++ solutions.


Of course, replace char with wchar_t and std::string with std::wstring if you're using wide character strings.

std::string input(/* ... */);
std::string output;
for(std::string::const_iterator it = input.begin(); it != input.end(); ++it)
{
    char currentValue = *it;
    switch (currentValue)
    {
    case L'\t':
        output.append("\\t");
        break;
    case L'\\':
        output.append("\\\\");
        break;
    //.... etc.
    default:
        output.push_back(currentValue);
    }
}

You can do this in C but it's going to be more difficult because you don't know the buffer size in advance (Though you can make a worst case guess of 2 times the size of the original string). I.e.

//Disclaimer; it's been a while since I've written pure C, so this may
//have a bug or two.
const char * input = // ...;
size_t inputLen = strlen(input);
char * output = malloc(inputLen * 2);
const char * inputPtr = input;
char * outputPtr = output;
do
{
    char currentValue = *inputPtr;
    switch (currentValue)
    {
    case L'\t':
        *outputPtr++ = '\\';
        *outputPtr = 't';
        break;
    case L'\\':
        *outputPtr++ = '\\';
        *outputPtr = '\\';
        break;
    //.... etc.
    default:
        *outputPtr = currentValue;
    }
} while (++outputPtr, *inputPtr++);

(Remember to add error handling to the C version for things like malloc failures ;) )


Here is something I came up with...

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

inline char needs_escaping(char val) {
        switch(val) {
                case '\n': return 'n';
                case '\r': return 'r';
                case '\t': return 't';
        }
        return 0;
}

char *escape_string(char *in) {
        unsigned int needed = 0, j = 0, length = strlen(in), i;
        for(i = 0; i < length; i++) {
                if(needs_escaping(in[i])) needed++;
        }

        char *out = malloc(length + needed + 1);
        for(i = 0; i < length; i++) {
                char escape_val = needs_escaping(in[i]);
                if(escape_val) {
                        out[j++] = '\\';
                        out[j++] = escape_val;
                }
                else {
                        out[j++] = in[i];
                }
        }
        out[length + needed] = '\0';    
        return out;
}

int main() {
        char *in  = "A\r\nB\tC\nD";
        char *out = escape_string(in);
        printf("%s\n", out);
        free(out);
        return 0;
}


I doubt there's any standard library function that does this directly. The most efficient way would be simply to iterate over the input buffer character by character, conditionally copying into an output buffer, with some special state-machine logic to handle '\', etc.

I'm sure there are ways to do this with various combinations of strchr() et al, but it will probably be less efficient in the general case.


I would create a lookup table of 32 const char* literals, one for every control code (ASCII 0 to ASCII 31). I would then iterate over the original string, copying non-control chars (ASCII >= 32) to the output buffer and substituting values from the lookup table for ASCII 0--31.

Note 1: ASCII 0 is obviously special for C strings (not so for C++.)

Note 2: The lookup table would contain C escape sequences for codes that have them (\n, \r etc) and backslash plus hex/octal/decimal codes for those that don't.


Here's an algorithm in C#. Maybe you can treat it like pseudo-code and convert it to C++.

public static string EscapeChars(string Input) { string Output = "";

foreach (char c in Input)
{
    switch (c)
    {
        case '\n':
            Output += "\\n";
            break;
        case '\r':
            Output += "\\r";
            break;
        case '\t':
            Output += "\\t";
            break;
        default:
            Output += c;
            break;
    }                
}
return Output;

}

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜