Word count in Rails?

2022-12-17 10:43 问答作者：

Say I have a blog model with Title and Body. How I do show the number of words in Body and characters in Title? I want the outpu开发者_如何学Pythont to be something like this

Title: Lorem Body: Lorem Lorem Lorem

This post has word count of 3.

"Lorem Lorem Lorem".scan(/\w+/).size
=> 3

UPDATE: if you need to match rock-and-roll as one word, you could do like

"Lorem Lorem Lorem rock-and-roll".scan(/[\w-]+/).size
=> 4

Also:

"Lorem Lorem Lorem".split.size
=> 3

If you're interested in performance, I wrote a quick benchmark:

require 'benchmark'
require 'bigdecimal/math'
require 'active_support/core_ext/string/filters'

# Where "shakespeare" is the full text of The Complete Works of William Shakespeare...

puts 'Benchmarking shakespeare.scan(/\w+/).size x50'
puts Benchmark.measure { 50.times { shakespeare.scan(/\w+/).size } }
puts 'Benchmarking shakespeare.squish.scan(/\w+/).size x50'
puts Benchmark.measure { 50.times { shakespeare.squish.scan(/\w+/).size } }
puts 'Benchmarking shakespeare.split.size x50'
puts Benchmark.measure { 50.times { shakespeare.split.size } }
puts 'Benchmarking shakespeare.squish.split.size x50'
puts Benchmark.measure { 50.times { shakespeare.squish.split.size } }

The results:

Benchmarking shakespeare.scan(/\w+/).size x50
 13.980000   0.240000  14.220000 ( 14.234612)
Benchmarking shakespeare.squish.scan(/\w+/).size x50
 40.850000   0.270000  41.120000 ( 41.109643)
Benchmarking shakespeare.split.size x50
  5.820000   0.210000   6.030000 (  6.028998)
Benchmarking shakespeare.squish.split.size x50
 31.000000   0.260000  31.260000 ( 31.268706)

In other words, squish is slow with Very Large Strings™. Other than that, split is faster (twice as fast if you're not using squish).

The answers here have a couple of issues:

They don't account for utf and unicode chars (diacritics): áâãêü etc...
They don't account for apostrophes and hyphens. So Joe's will be considered two words Joe and 's which is obviously incorrect. As will twenty-two, which is a single compound word.

Something like this works better and account for those issues:

foo.scan(/[\p{Alpha}\-']+/)

You might want to look at my Words Counted gem. It allows to count words, their occurrences, lengths, and a couple of other things. It's also very well documented.

counter = WordsCounted::Counter.new(post.body)
counter.word_count #=> 3
counter.most_occuring_words #=> [["lorem", 3]]
# This also takes into capitalisation into account.
# So `Hello` and `hello` are counted as the same word.

"Lorem Lorem Lorem".scan(/\S+/).size
=> 3

"caçapão adipisicing elit".scan(/[\w-]+/).size 
=> 5

But as we can see, the sentence has only 3 words. The problem is related with the accented characters, because the regex \w doesn't consider them as a word character [A-Za-z0-9_].

An improved solution would be

I18n.transliterate("caçapão adipisicing elit").scan(/[\w-]+/).size
=> 3

继续阅读：ruby word-count

Word count in Rails?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？