开发者

Determine if a string contains a base64 string inside of it

I'm trying to figure out a way to parse out a base64 string from with a larger string.

I have the string 开发者_运维问答"Hello <base64 content> World" and I want to be able to parse out the base64 content and convert it back to a string. "Hello Awesome World"

Answers in C# preferred.

Edit: Updated with a more real example.

--abcdef
\n
Content-Type: Text/Plain;
Content-Transfer-Encoding: base64
\n
<base64 content>
\n
--abcdef--

This is taken from 1 sample. The problem is that the Content.... vary quite a bit from one record to the next.


There is no reliable way to do it. How would you know that, for instance, "Hello" is not a base64 string ? OK, it's a bad example because base64 is supposed to be padded so that the length is a multiple of 4, but what about "overflow" ? It's 8-character long, it is a valid base64 string (it would decode to "¢÷«~Z0"), even though it's obviously a normal word to a human reader. There's just no way you can tell for sure whether a word is a normal word or base64 encoded text.

The fact that you have base64 encoded text embedded in normal text is clearly a design mistake, I suggest you do something about it rather that trying to do something impossible...


In short form you could:

  • split the string on any chars that are not valid base64 data or padding
  • try to convert each token
  • if the conversion succeeds, call replace on the original string to switch the token with the converted value

In code:

var delimiters = new char[] { /* non-base64 ASCII chars */ };
var possibles = value.Split(delimiters, StringSplitOptions.RemoveEmptyEntries);
//need to tweak to include padding chars in matches, but still split on padding?
//maybe better off creating a regex to match base64 + padding
//and using Regex.Split?

foreach(var match in possibles)
{
    try
    {
        var converted = Convert.FromBase64String(match);
        var text = System.Text.Encoding.UTF8.GetString(converted);
        if(!string.IsNullOrEmpty(text))
        {
            value = value.Replace(match, text);
        }
    } 
    catch (System.ArgumentNullException) 
    {
        //handle it
    }
    catch (System.FormatException) 
    {
        //handle it
    }
}

Without a delimiter though, you can end up converting non-base64 text that happens to be also be valid as base64 encoded text.

Looking at your example of trying to convert "Hello QXdlc29tZQ== World" to "Hello Awesome World" the above algorithm could easily generate something like "ée¡Ý•Í½µ”¢¹]" by trying to convert the whole string from base64 since there is no delimiter between plain and encoded text.

Update (based on comments):

If there are no '\n's in the base64 content and it is always preceded by "Content-Transfer-Encoding: base64\n", then there is a way:

  • split the string on '\n'
  • iterate over all the tokens until a token ends in "Content-Transfer-Encoding: base64"
  • the next token (if there are any) should be decoded (if possible) and then the replacement should be made in the original string
  • return to iterating until out of tokens

In code:

private string ConvertMixedUpTextAndBase64(string value)
{
    var delimiters = new char[] { '\n' };
    var possibles = value.Split(delimiters, 
                                StringSplitOptions.RemoveEmptyEntries);

    for (int i = 0; i < possibles.Length - 1; i++)
    {
        if (possibles[i].EndsWith("Content-Transfer-Encoding: base64"))
        {
            var nextTokenPlain = DecodeBase64(possibles[i + 1]);
            if (!string.IsNullOrEmpty(nextTokenPlain))
            {
                value = value.Replace(possibles[i + 1], nextTokenPlain);
                i++;
            }
        }                
    }
    return value;
}

private string DecodeBase64(string text)
{
    string result = null;
    try
    {
        var converted = Convert.FromBase64String(text);
        result = System.Text.Encoding.UTF8.GetString(converted);
    }
    catch (System.ArgumentNullException)
    {
        //handle it
    }
    catch (System.FormatException)
    {
        //handle it
    }
    return result;
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜