Unicode-to-string conversion in C#
How can I convert a Unicode value to its equivalent string?
For example, I have "రమెశ్", and I need a function that accepts this Unicode value and returns a string.
I was looking at the System.Text.Encoding.Convert() function, but that does not take in a Unicode value; it takes two encoding开发者_开发百科s and a byte array.
I bascially have a byte array that I need to save in a string field and then come back later and convert the string first back to a byte array.
So I use ByteConverter.GetString(byteArray) to save the byte array to a string, but I can't get it back to a byte array.
Use .ToString();
:
this.Text = ((char)0x00D7).ToString();
Try the following:
byte[] bytes = ...;
string convertedUtf8 = Encoding.UTF8.GetString(bytes);
string convertedUtf16 = Encoding.Unicode.GetString(bytes); // For UTF-16
The other way around is using `GetBytes():
byte[] bytesUtf8 = Encoding.UTF8.GetBytes(convertedUtf8);
byte[] bytesUtf16 = Encoding.Unicode.GetBytes(convertedUtf16);
In the Encoding
class, there are more variants if you need them.
To convert a string to a Unicode string, do it like this: very simple... note the BytesToString function which avoids using any inbuilt conversion stuff. Fast, too.
private string BytesToString(byte[] Bytes)
{
MemoryStream MS = new MemoryStream(Bytes);
StreamReader SR = new StreamReader(MS);
string S = SR.ReadToEnd();
SR.Close();
return S;
}
private string ToUnicode(string S)
{
return BytesToString(new UnicodeEncoding().GetBytes(S));
}
UTF8Encoding
Class
UTF8Encoding uni = new UTF8Encoding();
Console.WriteLine( uni.GetString(new byte[] { 1, 2 }));
There are different types of encoding. You can try some of them to see if your bytestream get converted correctly:
System.Text.ASCIIEncoding encodingASCII = new System.Text.ASCIIEncoding();
System.Text.UTF8Encoding encodingUTF8 = new System.Text.UTF8Encoding();
System.Text.UnicodeEncoding encodingUNICODE = new System.Text.UnicodeEncoding();
var ascii = string.Format("{0}: {1}", encodingASCII.ToString(), encodingASCII.GetString(textBytesASCII));
var utf = string.Format("{0}: {1}", encodingUTF8.ToString(), encodingUTF8.GetString(textBytesUTF8));
var unicode = string.Format("{0}: {1}", encodingUNICODE.ToString(), encodingUNICODE.GetString(textBytesCyrillic));
Have a look here as well: http://george2giga.com/2010/10/08/c-text-encoding-and-transcoding/.
var ascii = $"{new ASCIIEncoding().ToString()}: {((ASCIIEncoding)new ASCIIEncoding()).GetString(textBytesASCII)}";
var utf = $"{new UTF8Encoding().ToString()}: {((UTF8Encoding)new UTF8Encoding()).GetString(textBytesUTF8)}";
var unicode = $"{new UnicodeEncoding().ToString()}: {((UnicodeEncoding)new UnicodeEncoding()).GetString(textBytesCyrillic)}";
Wrote a cycle for converting unicode symbols in string to UTF8 letters:
string stringWithUnicodeSymbols = @"{""id"": 10440119, ""photo"": 10945418, ""first_name"": ""\u0415\u0432\u0433\u0435\u043d\u0438\u0439""}";
var splitted = Regex.Split(stringWithUnicodeSymbols, @"\\u([a-fA-F\d]{4})");
string outString = "";
foreach (var s in splitted)
{
try
{
if (s.Length == 4)
{
var decoded = ((char) Convert.ToUInt16(s, 16)).ToString();
outString += decoded;
}
else
{
outString += s;
}
}
catch (Exception e)
{
outString += s;
}
}
精彩评论