开发者

Objective-C Find the most commonly used words in an NSString

I am trying to write a method:

- (NSDictionary *)wordFrequencyFromString:(NSString *)string {}

where the dictionary returned will have the words and how often they were used in开发者_JAVA技巧 the string provided. Unfortunately, I can't seem to find a way to iterate through words in a string to analyze each one - only each character which seems like a bit more work than necessary. Any suggestions?


NSString has -enumerateSubstringsInRange: method which allows to enumerate all words directly, letting standard api to do all necessary stuff to define word boundaries etc:

[s enumerateSubstringsInRange:NSMakeRange(0, [s length])
                      options:NSStringEnumerationByWords
                   usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop) {
                       NSLog(@"%@", substring);
                   }];

In the enumeration block you can use either NSDictionary with words as keys and NSNumber as their counts, or use NSCountedSet that provides required functionality for counts.


You can use componentsSeparatedByCharactersInSet: to split the string and NSCountedSet will count the words for you.

1) Split the string into words using a combination of the punctuation, whitespace and new line character sets:

NSMutableCharacterSet *separators = [NSMutableCharacterSet punctuationCharacterSet];
[separators formUnionWithCharacterSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];

NSArray *words = [myString componentsSeparatedByCharactersInSet:separators];

2) Count the occurrences of the words (if you want to disregard capitalization, you can do NSString *myString = [originalString lowercaseString]; before splitting the string into components):

NSCountedSet *frequencies = [NSCountedSet setWithArray:words];
NSUInteger aWordCount = [frequencies countForObject:@"word"]);

If you are willing to change your method signature, you can just return the counted set.


Split the string into an array of words using -[NSString componentsSeparatedByCharactersInSet:] first. (Use [[NSCharacterSet letterCharacterSet] invertedSet] as the argument to split on all non-letter characters.)


I used following approach for getting most common word from NSString.

-(void)countMostFrequentWordInSpeech:(NSString*)speechString
{
    NSString     *string     = speechString;
    NSCountedSet *countedSet = [NSCountedSet new];
    [string enumerateSubstringsInRange:NSMakeRange(0, [string length])
                               options:NSStringEnumerationByWords | NSStringEnumerationLocalized
                            usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop){

                                    [countedSet addObject:substring];
                            }];
    // NSLog(@"%@", countedSet);
    //Sort CountedSet & get most  frequent common word at 0th index of resultant array
    NSMutableArray *dictArray = [NSMutableArray array];
    [countedSet enumerateObjectsUsingBlock:^(id obj, BOOL *stop) {
        [dictArray addObject:@{@"object": obj,
                               @"count": @([countedSet countForObject:obj])}];
    }];

    NSArray *sortedArrayOfWord= [dictArray sortedArrayUsingDescriptors:@[[NSSortDescriptor sortDescriptorWithKey:@"count" ascending:NO]]];
    if (sortedArrayOfWord.count>0)
    {
        self.mostFrequentWordLabel.text=[NSString stringWithFormat:@"Frequent Word: %@", [[sortedArrayOfWord[0] valueForKey:@"object"] capitalizedString]];
    }
}

"speechString" is my string from which I have to get most frequent/common words. Object at 0th index of array "sortedArrayOfWord" would be most common word.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜