开发者

Ruby: Fast way to filter out keywords from text based on an array of words

I have a large text string and about 200 keywords that I want to filter out of the text.

There are numerous ways todo this, but I'm stuck on which way is the best:

1) Use a for开发者_C百科 loop with a gsub for each keyword

3) Use a massive regular expression

Any other ideas, what would you guys suggest


A massive regex is faster as it's going to walk the text only once.

Also, if you don't need the text, only the words, at the end, you can make the text a Set of downcased words and then remove the words that are in the filter array. But this only works if you don't need the "text" to make sense at the end (usually for tags or full text search).


Create a hash with each valid keyword as key.

keywords = %w[foo bar baz]
keywords_hash = Hash[keywords.map{|k|[k,true]}]

Assuming all keywords are 3 letters or more, and consist of alphanumeric characters or a dash, case is irrelevant, and you only want each keyword present in the text returned once:

keywords_in_text = text.downcase.scan(/[[:alnum:][-]]{3,}/).select { |word|
  keywords_hash.has_key? word
}.uniq

This should be reasonably efficient even when both the text to be searched and the list of valid keywords are very large.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜