How to count the number of columns required by a japanese-english mixed string?
My string contains a mix of japanese (double width) and e开发者_JAVA技巧nglish (single width) characters:
string str = "女性love";
In C#, my method has to count japanese characters as two columns and english characters as one. So that the above string should get me a 8 columns :
2 + 2 + 1 + 1 + 1 + 1 = 8
Probbaly you want something like this, very rough one, but by working a little bit on it you can make it much nicer:
string str = "女性love";
int iTotal = 0;
str.ToList().ForEach(ch=>{
int iCode = ch;
if(iCode>= 65 && iCode <= 122)
iTotal++;
else
iTotal +=2;
});
//65 is 'a', 122 is 'z'. iTotal = 8 //in this case
Now what about why System.Text.Encoding.UTF8.GetBytes(str).Length
returns 10, it simply cause UTF8
ecoding specification. Follow this link Joel on Unicode and read entire article. In particular here is most importnat stuff in regard of this question:
In UTF-8, every code point from 0-127 is stored in a single byte. Only code points 128 and above are stored using 2, 3, in fact, up to 6 bytes
Check your Japanese letters code points and you will figure out an aswer on why it returns 10.
EDIT
Pay attention that this code, actually separate English letters from "others", and not only from Japanese ones. If you need to filter only on Japanese ones, cause may be you need to deal with Arabic, Ebraic, Russian or whatever, you need to know limits, in terms of codes, of Japanese alphabet.
Regards.
Try something like this:
int bCnt = System.Text.Encoding.UTF8.GetBytes(str).Length; //Select the appropriate encoding, if not UTF8
精彩评论