Function to do characters conversion automatically
I have this code:
- (void)parser:(NSXMLParser *)parser foundCDATA:(NSData *)CDATABlock
{
NSString *someString = [[NSString alloc] initWithData:CDATABlock encoding:NSUTF8StringEncoding];
someString = [ someString stringByReplacingOccurrencesOfString:@"%" withString: @"&" ];
someString = [ someString stringByReplacingOccurrencesOfString:@"|" withString: @"|" ];
someString = [ someString stringByReplacingOccurrencesOfString:@" " withString: @" " ];
someString = [ someString stringByReplacingOccurrencesOfString:@"–" withString:@"-"];
someString = [ someString stringByReplacingOccurrencesOfString:@"—" withString:@"——"];
someString = [ someString stringByReplacingOccurrencesOfString:@"‘" withString:@"'" ];
someString = [ someString stringByReplacingOccurrencesOfString:@"’" withString:@"'" ];
someString = [ someString stringByReplacingOccurrencesOfString:@"‚" withString:@"," ];
someString = [ someString stringByReplacingOccurrencesOfString:@"“" withString:@"\"" ];
someString = [ someString stringByReplacingOccurrencesOfString:@"”" withString:@"\"" ];
someString = [ someString stringByReplacingOccurrencesOfString:@"…" withString:@"..."];
someString = [ someString stringByReplacingOccurrencesOfString:@"&" withStri开发者_JAVA技巧ng:@"<"];
someString = [ someString stringByReplacingOccurrencesOfString:@"'" withString:@">"];
someString = [ someString stringByReplacingOccurrencesOfString:@"€" withString:@"€"];
someString = [ someString stringByReplacingOccurrencesOfString:@"→" withString:@"→"];
if(nil != self.currentItemValue){
[self.currentItemValue appendString:someString];
}
}
Is there a function to do this characters conversion automatically?
Instead of hardcoding the replacement like that, there's a better way.
These entities are of the form: &#
+ decimal number + ;
. The decimal number bit is the base 10 version of that character's unicode code point. So you could search for substrings in this format, extract the number, and convert it to a character directly.
Here's one way to do it, using RegexKitLite to find the strings:
NSString * source = @"& ' |   – — ‘ ’ ‚ “ ” … € →";
NSString * regex = @"&#(\\d+);";
NSArray * matches = [source arrayOfCaptureComponentsMatchedByRegex:regex];
NSMutableString * decodedSource = [source mutableCopy];
for (NSArray * match in matches) {
NSString * fullMatch = [match objectAtIndex:0];
NSString * decimalCode = [match objectAtIndex:1];
unichar character = (unichar)[decimalCode intValue];
NSString * replacement = [NSString stringWithFormat:@"%C", character];
[decodedSource replaceOccurrencesOfString:fullMatch withString:replacement options:NSLiteralSearch range:NSMakeRange(0, [decodedSource length])];
}
NSLog(@"decoded: %@", decodedSource);
[decodedSource release];
On my machine, this logs:
decoded: & ' | – — ‘ ’ ‚ “ ” … € →
This isn't the most efficient method (it's worst case a O(nm)
algorithm), but it's a start. :)
Wow, that's pretty bad, as well as inefficient. At a bare minimum, please switch over to using NSMutableString
and doing inline replaces instead.
In any case, you can do this in one pass, but you have to write the code yourself. You can either use NSScanner
or a method like -rangeOfString:options:range:
to locate each successive entity and then figure out its replacement yourself. If you're operating on an NSMutableString
, you can then replace the entity with its replacement and continue searching (after modifying your location (in the case of NSScanner) or range appropriately to account for the length difference between the entity and the replacement character).
精彩评论