Objective-C Find the most commonly used words in an NSString
I am trying to write a method:
- (NSDictionary *)wordFrequencyFromString:(NSString *)string {}
where the dictionary returned will have the words and how often they were used in开发者_JAVA技巧 the string provided. Unfortunately, I can't seem to find a way to iterate through words in a string to analyze each one - only each character which seems like a bit more work than necessary. Any suggestions?
NSString has -enumerateSubstringsInRange:
method which allows to enumerate all words directly, letting standard api to do all necessary stuff to define word boundaries etc:
[s enumerateSubstringsInRange:NSMakeRange(0, [s length])
options:NSStringEnumerationByWords
usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop) {
NSLog(@"%@", substring);
}];
In the enumeration block you can use either NSDictionary with words as keys and NSNumber as their counts, or use NSCountedSet that provides required functionality for counts.
You can use componentsSeparatedByCharactersInSet:
to split the string and NSCountedSet
will count the words for you.
1) Split the string into words using a combination of the punctuation, whitespace and new line character sets:
NSMutableCharacterSet *separators = [NSMutableCharacterSet punctuationCharacterSet];
[separators formUnionWithCharacterSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];
NSArray *words = [myString componentsSeparatedByCharactersInSet:separators];
2) Count the occurrences of the words (if you want to disregard capitalization, you can do NSString *myString = [originalString lowercaseString];
before splitting the string into components):
NSCountedSet *frequencies = [NSCountedSet setWithArray:words];
NSUInteger aWordCount = [frequencies countForObject:@"word"]);
If you are willing to change your method signature, you can just return the counted set.
Split the string into an array of words using -[NSString componentsSeparatedByCharactersInSet:]
first. (Use [[NSCharacterSet letterCharacterSet] invertedSet]
as the argument to split on all non-letter characters.)
I used following approach for getting most common word from NSString.
-(void)countMostFrequentWordInSpeech:(NSString*)speechString
{
NSString *string = speechString;
NSCountedSet *countedSet = [NSCountedSet new];
[string enumerateSubstringsInRange:NSMakeRange(0, [string length])
options:NSStringEnumerationByWords | NSStringEnumerationLocalized
usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop){
[countedSet addObject:substring];
}];
// NSLog(@"%@", countedSet);
//Sort CountedSet & get most frequent common word at 0th index of resultant array
NSMutableArray *dictArray = [NSMutableArray array];
[countedSet enumerateObjectsUsingBlock:^(id obj, BOOL *stop) {
[dictArray addObject:@{@"object": obj,
@"count": @([countedSet countForObject:obj])}];
}];
NSArray *sortedArrayOfWord= [dictArray sortedArrayUsingDescriptors:@[[NSSortDescriptor sortDescriptorWithKey:@"count" ascending:NO]]];
if (sortedArrayOfWord.count>0)
{
self.mostFrequentWordLabel.text=[NSString stringWithFormat:@"Frequent Word: %@", [[sortedArrayOfWord[0] valueForKey:@"object"] capitalizedString]];
}
}
"speechString" is my string from which I have to get most frequent/common words. Object at 0th index of array "sortedArrayOfWord" would be most common word.
精彩评论