开发者

How to encode Japanese characters

I have to develop a program. This is encoding system.

I have this Japanese characters that are:

つれづれなるまゝに、日暮らし、硯にむかひて、心にうつりゆくよしなし事を、そこはかとなく書きつくれば、あやしうこそものぐるほしけれ

I want 开发者_StackOverflow中文版to convert this string to encoding like this:

%26%2312388%3B%26%2312428%3B%26%2312389%3B%26%2312428%3B%26%2312394%3B%26%2312427%3B%26%2312414%3B%26%2312445%3B%26%2312395%3B%26%2312289%3B%26%2326085%3B%26%2326286%3B%26%2312425%3B%26%2312375%3B%26%2312289%3B%26%2330831%3B%26%2312395%3B%26%2312416%3B%26%2312363%3B%26%2312402%3B%26%2312390%3B%26%2312289%3B%26%2324515%3B%26%2312395%3B%26%2312358%3B%26%2312388%3B%26%2312426%3B%26%2312422%3B%26%2312367%3B%26%2312424%3B%26%2312375%3B%26%2312394%3B%26%2312375%3B%26%2320107%3B%26%2312434%3B%26%2312289%3B%26%2312381%3B%26%2312371%3B%26%2312399%3B%26%2312363%3B%26%2312392%3B%26%2312394%3B%26%2312367%3B%26%2326360%3B%26%2312365%3B%26%2312388%3B%26%2312367%3B%26%2312428%3B%26%2312400%3B%26%2312289%3B%26%2312354%3B%26%2312420%3B%26%2312375%3B%26%2312358%3B%26%2312371%3B%26%2312381%3B%26%2312418%3B%26%2312398%3B%26%2312368%3B%26%2312427%3B%26%2312411%3B%26%2312375%3B%26%2312369%3B%26%2312428%3B%26%2312290%3B.

How can I do that?


I believe you are looking for HttpUtility.UrlEncode, can't figure out the encoding to get exactly the same output that you show.

var testString = "つれづれなるまゝに、日暮らし、硯にむかひて、心にうつりゆくよしなし事を、そこはかとなく書きつくれば、あやしうこそものぐるほしけれ。";
var encodedUrl = HttpUtility.UrlEncode(testString, Encoding.UTF8);

You might want to change your question, as you don't really need to convert Unicode to ASCII, which is impossible. You rather need to Persent encode or URL encode Percent-encoding.

[EDIT]

I figured it out:

var testString = "つれづれなるまゝに、日暮らし、硯にむかひて、心にうつりゆくよしなし事を、そこはかとなく書きつくれば、あやしうこそものぐるほしけれ。";
var htmlEncoded = string.Concat(testString.Select(arg => string.Format("&#{0};", (int)arg)));
var result = HttpUtility.UrlEncode(htmlEncoded);

The result will exactly match to the encoding you that you provided. Step by step:

var inputChar = 'つ';
var charValue = (int)inputChar; // 12388
var htmlEncoded = "&#" + charValue + ";"; // つ
var ulrEncoded = HttpUtility.UrlEncode(htmlEncoded); // %26%2312388%3b


This is impossible. Unicode is so much larger than ASCII and you can't look up every character from Unicode in ASCII. while ASCII is 256 characters only (with control chars), Unicode is tens of thousands (I guess).


Here is a function that seems to work:

public static string UrlDoubleEncode(string text)
{
    if (text == null)
        return null;

    StringBuilder sb = new StringBuilder();
    foreach (int i in text)
    {
        sb.Append('&');
        sb.Append('#');
        sb.Append(i);
        sb.Append(';');
    }
    return HttpUtility.UrlEncode(sb.ToString());
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜