开发者

Convert Korean Text to Unicode

The thing which I want to ask is pretty simple. I have an HTML document which is hosted in a webbrowser control.

Now, when I select a Korean word using the MSHTML range property, I am able to get range.htmlText and range.Text. They both show the Korean word. All I want to do is to convert it to unicode format.

Is it possible?

FYI I am doi开发者_Python百科ng all this using C# WinForms.


Could you provide a little more information? What format is the "Korean word" in when you read it? (I assume the same as the HTML document header.) Could you post a sample HTML page from which you are trying to read?

If the problem is that the string you are getting simply is in a different code page, you can use the Encoding classes in .Net to convert it. For example, perhaps your text is in iso-2022-kr. Here is a sample to convert your string, called "stringInKoreanIsoEncoding" in the code below:

Encoding koreanEncoding = Encoding.GetEncoding(50225); // 50225 is the code page for iso-2022-kr
byte[] convertedToUtf8 = Encoding.Convert(koreanEncoding, Encoding.UTF8, koreanEncoding.GetBytes(stringInKoreanIsoEncoding));
string utf8String = Encoding.UTF8.GetString(convertedToUtf8);
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜