Escaping diacritics in a UTF8 string from C/Obj-C to javascript
First, a brief expl开发者_Python百科anation of why I'm doing this:
I'm loading strings from XML, and using these to interact with existing javascript functions. I need to escape them, only because I'm using the webview's stringByEvaluatingJavaScriptFromString method.
I'm using this escape function:
- (NSString *) stringByEscapingMetacharacters
{
const char *UTF8Input = [self UTF8String];
char *UTF8Output = [[NSMutableData dataWithLength:strlen(UTF8Input) * 4 + 1 /* Worst case */] mutableBytes];
char ch, *och = UTF8Output;
while ((ch = *UTF8Input++))
if (ch == '\'' || ch == '\'' || ch == '\\' || ch == '"')
{
*och++ = '\\';
*och++ = ch;
}
else if (isascii(ch))
och = vis(och, ch, VIS_NL | VIS_TAB | VIS_CSTYLE, *UTF8Input);
else
och+= sprintf(och, "\\%03hho", ch);
return [NSString stringWithUTF8String:UTF8Output];
}
It works fine, except for diacritics. For example, "é" shows up as "é"
So, how can I escape the diacritics?
You need to implement proper UTF-8 sequences escapement. Something like this:
if (ch == '\'' || ch == '\'' || ch == '\\' || ch == '"')
{
*och++ = '\\';
*och++ = ch;
}
else if (((unsigned char)ch & 0xe0) == 0xc0) // 2 byte utf8 sequence
{
*och++ = ch;
*och++ = UTF8Input++;
}
else if (((unsigned char)ch & 0xf0) == 0xe0) // 3 byte utf8 sequence
{
*och++ = ch;
*och++ = UTF8Input++;
*och++ = UTF8Input++;
}
else if (isascii(ch))
och = vis(och, ch, VIS_NL | VIS_TAB | VIS_CSTYLE, *UTF8Input);
精彩评论