开发者

I need help converting a C# string from one character encoding to another?

According to Spolsky I can't call myself a developer, so there is a lot of shame behind this question...

Scenario: From a C# application, I would like to take a string value from a SQL db and use it as the name of a directory. I have a secure (SSL) FTP server on which I want to set the current directory using the string value from the DB.

Problem: Every开发者_如何学运维thing is working fine until I hit a string value with a "special" character - I seem unable to encode the directory name correctly to satisfy the FTP server.

The code example below

  • uses "special" character é as an example
  • uses WinSCP as an external application for the ftps comms
  • does not show all the code required to setup the Process "_winscp".
  • sends commands to the WinSCP exe by writing to the process standardinput
  • for simplicity, does not get the info from the DB, but instead simply declares a string (but I did do a .Equals to confirm that the value from the DB is the same as the declared string)
  • makes three attempts to set the current directory on the FTP server using different string encodings - all of which fail
  • makes an attempt to set the directory using a string that was created from a hand-crafted byte array - which works

Process _winscp = new Process();
byte[] buffer;

string nameFromString = "Sinéad O'Connor";
_winscp.StandardInput.WriteLine("cd \"" + nameFromString + "\"");

buffer = Encoding.UTF8.GetBytes(nameFromString);
_winscp.StandardInput.WriteLine("cd \"" + Encoding.UTF8.GetString(buffer) + "\"");

buffer = Encoding.ASCII.GetBytes(nameFromString);
_winscp.StandardInput.WriteLine("cd \"" + Encoding.ASCII.GetString(buffer) + "\"");

byte[] nameFromBytes = new byte[] { 83, 105, 110, 130, 97, 100, 32, 79, 39, 67, 111, 110, 110, 111, 114 };
_winscp.StandardInput.WriteLine("cd \"" + Encoding.Default.GetString(nameFromBytes) + "\"");

The UTF8 encoding changes é to 101 (decimal) but the FTP server doesn't like it.

The ASCII encoding changes é to 63 (decimal) but the FTP server doesn't like it.

When I represent é as value 130 (decimal) the FTP server is happy, except I can't find a method that will do this for me (I had to manually contruct the string from explicit bytes).

Anyone know what I should do to my string to encode the é as 130 and make the FTP server happy and finally elevate me to level 1 developer by explaining the only single thing a developer should understand?


130 isn't ASCII (ASCII is only 7bits -- see the Encoding.ASCII documentation -- so it whacks the "é" into a normal "?" because it has nothing better to do). UTF-8 is actually encoding the character into two bytes (decimal: 195 & 169) but preserves the code-point.

Use a code-page explicitly, such as Latin (CP 1252) -- needs to match whatever other side is. As from below, there is no "130" in the output so... not the encoding you need :-) But the same applies: use an encoding for a specific code-page.

Edit: As Hans Passant explained in a comment, the code-page to use here is MS-DOS (CP 437) which will result in the desired results.

// LINQPad -- Encoding is System.Text.Encoding
var enc = Encoding.GetEncoding(1252);
string.Join(" ", enc.GetBytes("Sinéad O'Connor")).Dump();
// -> 83 105 110 233 97 100 32 79 39 67 111 110 110 111 114

See: http://msdn.microsoft.com/en-us/goglobal/bb688114 for more.

Happy coding.

Btw. good selection in artists -- if it was intentional :p


I think problem here is that ALL .NET string are in Unicode. There is no "what encoding I'm" in .NET strings. So using Encoding.ASCII.GetString(buffer) you convert your "string" in ASCII back into Unicode.

I think your problem should be solved by changing encoding for Process.StandardInput, so you get correct encoding inside WinSCP.

OR

You should check what Encoding.Default is, because I'm pretty sure it's not UTF8 or ASCII.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜