开发者

Ruby - Performance of regex

I am trying to see if there is a better performing way of finding an exact match for a word in a string. I am looking for a word in my database table for a field 'title'. The number of records vary widely and the performance I am seeing is pretty scary.

Here are the 3 ways I benchmarked the results.

title.split.include(search_string)
/\b#{search_string }\b/ =~ title
title.include?(search_string)

The best performance is for title.include?(search_string) which does not do an exact word search (and I am looking for an exact word search)

  def do_benchmark(search_results, search_string)
    n=1000

    Benchmark.bm do |x|
      x.report("\b word search \b:")           {
        n.times {
          search_results.each {|search_result|
          title = search_result.title         
          /\b#{search_string}\b/ =~ title         
        }
      }
     }
  end

    Benchmark.bm do |x|
      search_string = search.search_string
      x.report("split.include? search:") {
        n.times {
          search_results.each {|se开发者_C百科arch_result|
            title = search_result.title
            title.split.include?(search_string)
          }

        }
      }
    end

   Benchmark.bm do |x|
     search_string = search.search_string
     x.report("string include? search:") {
     n.times {
       search_results.each {|search_result|
       title = search_result.title
       title.include?(search_string)
     }

    }
  }
end

"processing: 6234 records"
"Looking for term: red ferrari"
 user     system      total        real
 word search: 50.380000   2.600000  52.980000 ( 57.019927)
 user     system      total        real
 split.include? search: 54.600000   0.260000  54.860000 ( 57.854837)
 user     system      total        real
 string include? search: 21.600000   0.060000  21.660000 ( 21.949715)

Is there any way I can get better performance AND exact string match results?


You want full text search of a model field. This is best accomplished not by regex scans, but by a specialized index for full text retrieval. Rather than roll your own, I'd recommend using one of the following:

  • acts_as_indexed
  • Sphinx
  • Ferret
  • Xapian
  • Lucene/Solr

Here's some links with some more detail on the options:

  • http://locomotivation.squeejee.com/post/109284085/mulling-over-our-ruby-on-rails-full-text-search-options
  • Full Text Searching with Rails


Do a split on whitespaces on your string, go through each word in the split string, then check against == operator.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜