C# ASCII GetBytes how to set which character is used for unrecognizable conversion?
I am porting some code from native C++ to C# and I need to do the following:
ASCII.GetBytes
when it encounters a unicode character it does not recognize it returns to me character with decimal number 63 (question mark) but in my C++ code using WideCharToMultiByte开发者_Python百科(CP_ACP, ...
when it encounters a character it doesn't know it uses character with decimal number 37 (% sign).
My question is how can I make ASCII.GetBytes return to me #37 instead of #63 for unknown characters?
In C#, you can use the DecoderFallback
/EncoderFallback
of an encoding to decide how it will behave. You can't change the fallback of Encoding.ASCII
itself, but you can clone it and then set the fallback. Here's an example:
using System;
using System.Text;
class Test
{
static void Main()
{
Encoding asciiClone = (Encoding) Encoding.ASCII.Clone();
asciiClone.DecoderFallback = new DecoderReplacementFallback("%");
asciiClone.EncoderFallback = new EncoderReplacementFallback("%");
byte[] bytes = { 65, 200, 66 };
string text = asciiClone.GetString(bytes);
Console.WriteLine(text); // Prints A%B
bytes = asciiClone.GetBytes("A\u00ffB");
Console.WriteLine(bytes[1]); // Prints 37
}
}
Presumably the C++ code calls WideCharToMultiByte
with lpDefaultChar = "%"
.
There's no way to pass this into the Encoding.GetBytes
call, but you could call WideCharToMultiByte
using P/Invoke.
精彩评论