Objective c doesn't like my unichars?
Xcode complaints about "multi-character character contant"'s when I try to do the following:
static unichar accent characters[] = { 'ā', 'á', 'ă', 'à' };
How do you make an array of charac开发者_Python百科ters, when not all of them are ascii? The following works just fine
static unichar accent[] = { 'a', 'b', 'c' };
Workaround
The closest work around I have found is to convert the special characters into hex, ie this works:
static unichar accent characters[] = { 0x0100, 0x0101, 0x0102 };
It's not that Objective-C doesn't like it, it's that C doesn't. The constant 'c'
is for char
which has 1 byte, not unichar
which has 2 bytes. (see the note below for a bit more detail.)
There's no perfectly supported way to represent a unichar
constant. You can use
char* s="ü";
in a UTF-8-encoded source file to get the unicode C-string, or
NSString* s=@"ü";
in a UTF-8 encoded source file to get an NSString
. (This was not possible before 10.5. It's OK for iPhone.)
NSString
itself is conceptually encoding-neutral; but if you want, you can get the unicode character by using -characterAtIndex:
.
Finally two comments:
If you just want to remove accents from the string, you can just use the method like this, without writing the table yourself:
-(NSString*)stringWithoutAccentsFromString:(NSString*)s { if (!s) return nil; NSMutableString *result = [NSMutableString stringWithString:s]; CFStringFold((CFMutableStringRef)result, kCFCompareDiacriticInsensitive, NULL); return result; }
See the document of CFStringFold.
- If you want unicode characters for localization/internationalization, you shouldn't embed the strings in the source code. Instead you should use
Localizable.strings
andNSLocalizedString
. See here.
Note:
For arcane historical reasons, 'a'
is an int
in C, see the discussions here. In C++, it's a char
. But it doesn't change the fact that writing more than one byte inside '...'
is implementation-defined and not recommended. For example, see ISO C Standard 6.4.4.10. However, it was common in classic Mac OS to write the four-letter code enclosed in single quotes, like 'APPL'
. But that's another story...
Another complication is that accented letters are not always represented by 1 byte; it depends on the encoding. In UTF-8, it's not. In ISO-8859-1, it is. And unichar
should be in UTF-16. Did you save your source code in UTF-16? I think the default of XCode is UTF-8. GCC might do some encoding conversion depending on the setup, too...
Or you can just do it like this:
static unichar accent characters[] = { L'ā', L'á', L'ă', L'à' };
L is a standard C keyword which says "I'm about to write a UNICODE character or character set".
Works fine for Objective-C too.
Note: The compiler may give you a strange warning about too many characters put inside a unichar, but you can safely ignore that warning. Xcode just doesn't deal with the unicode characters the right way, but the compiler parses them properly and the result is OK.
Depending on your circumstances, this may be a tidy way to do it:
NSCharacterSet* accents =
[NSCharacterSet characterSetWithCharactersInString:@"āáăà"];
And then, if you want to check if a given unichar is one of those accent characters:
if ([accents characterIsMember:someOtherUnichar])
{
}
NSString
also has many methods of its own for handling NSCharacterSet
objects.
精彩评论