开发者

Unicode-to-string conversion in C#

How can I convert a Unicode value to its equivalent string?

For example, I have "రమెశ్", and I need a function that accepts this Unicode value and returns a string.

I was looking at the System.Text.Encoding.Convert() function, but that does not take in a Unicode value; it takes two encoding开发者_开发百科s and a byte array.

I bascially have a byte array that I need to save in a string field and then come back later and convert the string first back to a byte array.

So I use ByteConverter.GetString(byteArray) to save the byte array to a string, but I can't get it back to a byte array.


Use .ToString();:

this.Text = ((char)0x00D7).ToString();


Try the following:

byte[] bytes = ...;

string convertedUtf8 = Encoding.UTF8.GetString(bytes);
string convertedUtf16 = Encoding.Unicode.GetString(bytes); // For UTF-16

The other way around is using `GetBytes():

byte[] bytesUtf8 = Encoding.UTF8.GetBytes(convertedUtf8);
byte[] bytesUtf16 = Encoding.Unicode.GetBytes(convertedUtf16);

In the Encoding class, there are more variants if you need them.


To convert a string to a Unicode string, do it like this: very simple... note the BytesToString function which avoids using any inbuilt conversion stuff. Fast, too.

private string BytesToString(byte[] Bytes)
{
  MemoryStream MS = new MemoryStream(Bytes);
  StreamReader SR = new StreamReader(MS);
  string S = SR.ReadToEnd();
  SR.Close();
  return S;
}

private string ToUnicode(string S)
{
  return BytesToString(new UnicodeEncoding().GetBytes(S));
}


UTF8Encoding Class

   UTF8Encoding uni = new UTF8Encoding();
   Console.WriteLine( uni.GetString(new byte[] { 1, 2 }));


There are different types of encoding. You can try some of them to see if your bytestream get converted correctly:

System.Text.ASCIIEncoding encodingASCII = new System.Text.ASCIIEncoding();
System.Text.UTF8Encoding encodingUTF8 = new System.Text.UTF8Encoding();
System.Text.UnicodeEncoding encodingUNICODE = new System.Text.UnicodeEncoding();

var ascii = string.Format("{0}: {1}", encodingASCII.ToString(), encodingASCII.GetString(textBytesASCII));
var utf =   string.Format("{0}: {1}", encodingUTF8.ToString(), encodingUTF8.GetString(textBytesUTF8));
var unicode = string.Format("{0}: {1}", encodingUNICODE.ToString(), encodingUNICODE.GetString(textBytesCyrillic));

Have a look here as well: http://george2giga.com/2010/10/08/c-text-encoding-and-transcoding/.


var ascii = $"{new ASCIIEncoding().ToString()}: {((ASCIIEncoding)new ASCIIEncoding()).GetString(textBytesASCII)}";
var utf = $"{new UTF8Encoding().ToString()}: {((UTF8Encoding)new UTF8Encoding()).GetString(textBytesUTF8)}";
var unicode = $"{new UnicodeEncoding().ToString()}: {((UnicodeEncoding)new UnicodeEncoding()).GetString(textBytesCyrillic)}";


Wrote a cycle for converting unicode symbols in string to UTF8 letters:

string stringWithUnicodeSymbols = @"{""id"": 10440119, ""photo"": 10945418, ""first_name"": ""\u0415\u0432\u0433\u0435\u043d\u0438\u0439""}";
var splitted = Regex.Split(stringWithUnicodeSymbols, @"\\u([a-fA-F\d]{4})");
string outString = "";
foreach (var s in splitted)
{
    try
    {
        if (s.Length == 4)
        {
            var decoded = ((char) Convert.ToUInt16(s, 16)).ToString();
            outString += decoded;
        }
        else
        {
            outString += s;
        }
    }
    catch (Exception e)
    {
        outString += s;
    }
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜