Ruby: Fast way to filter out keywords from text based on an array of words
I have a large text string and about 200 keywords that I want to filter out of the text.
There are numerous ways todo this, but I'm stuck on which way is the best:
1) Use a for开发者_C百科 loop with a gsub for each keyword
3) Use a massive regular expression
Any other ideas, what would you guys suggest
A massive regex is faster as it's going to walk the text only once.
Also, if you don't need the text, only the words, at the end, you can make the text a Set of downcased words and then remove the words that are in the filter array. But this only works if you don't need the "text" to make sense at the end (usually for tags or full text search).
Create a hash with each valid keyword as key.
keywords = %w[foo bar baz]
keywords_hash = Hash[keywords.map{|k|[k,true]}]
Assuming all keywords are 3 letters or more, and consist of alphanumeric characters or a dash, case is irrelevant, and you only want each keyword present in the text returned once:
keywords_in_text = text.downcase.scan(/[[:alnum:][-]]{3,}/).select { |word|
keywords_hash.has_key? word
}.uniq
This should be reasonably efficient even when both the text to be searched and the list of valid keywords are very large.
精彩评论