开发者

how to count the words of a string in ruby

I want to do something lik开发者_如何学Ce this

def get_count(string)
 sentence.split(' ').count
end

I think there's might be a better way, string may have built-in method to do this.


I believe count is a function so you probably want to use length.

def get_count(string) 
    sentence.split(' ').length
end

Edit: If your string is really long creating an array from it with any splitting will need more memory so here's a faster way:

def get_count(string) 
    (0..(string.length-1)).inject(1){|m,e| m += string[e].chr == ' ' ? 1 : 0 }
end


If the only word boundary is a single space, just count them.

puts "this sentence has five words".count(' ')+1 # => 5

If there are spaces, line endings, tabs , comma's followed by a space etc. between the words, then scanning for word boundaries is a possibility:

puts "this, is./tfour   words".scan(/\b/).size/2


I know this is an old question, but this might help someone stumbling here. Countring words is a complicated problem. What is a "word"? Do numbers and special characters count as words? Etc...

I wrote the words_counted gem for this purpose. It's a highly flexible, customizable string analyser. You can ask it to analyse any string for word count, word occurrences, and exclude words/characters using regexp, strings, and arrays.

counter = WordsCounted::Counter.new("Hello World!", exclude: "World")
counter.word_count #=> 1
counted.words      #=> ["Hello"]

Etc...

The documentation and full source are on Github.


using regular expression will also cover multi spaces:

sentence.split(/\S+/).size


String doesn't have anything pre-built to do what you wanted. You can define a method in your class or extend the String class itself for what you want to do:

def word_count( string )
  return 0 if string.empty?

  string.split.size
end


Regex split on any non-word character:

string.split(/\W+/).size

...although it makes apostrophe use count as two words, so depending on how small the margin of error needs to be, you might want to build your own regex expression.


I recently found that String#count is faster than splitting up the string by over an order of magnitude.

Unfortunately, String#count only accepts a string, not a regular expression. Also, it would count two adjacent spaces as two things, rather than a single thing, and you'd have to handle other white space characters seperately.


p "  some word\nother\tword.word|word".strip.split(/\s+/).size #=> 4


I'd rather check for word boundaries directly:

"Lorem Lorem Lorem".scan(/\w+/).size
=> 3

If you need to match rock-and-roll as one word, you could do like:

"Lorem Lorem Lorem rock-and-roll".scan(/[\w-]+/).size
=> 4
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜