Performance optimization for implementation of relationship classes in Ruby
This one is related to my previous question on the performance of Arrays and Hashes in Ruby.
Prerequisites
I know that using Hashes to store lots of Objects leads to significant performance increase because of the O(1) lookup.
Now let's assume I had two classes, namely A
and B
and they can be related to each other, but only if a third class C
exists (it's sort of a relationship class). To give a practical example, let's say I have Document
, Query
and the relationship class Judgement
(this is from information retrieval, so basically the judgement tells you whether the document is relevant for a query or not).
(I hope I got this right)
The Problem
In most cases you'd like to find out how many Judgements
there are for a combination of Doc开发者_高级运维ument
and Query
or if there are any.
In order to find out the latter, I'll iterate over each Jugdement
...
@judgements.each { |j| return true if j.document == document and j.query == query }
Now, this brings us back to a linear search again, which isn't that helpful.
How to solve it?
I was thinking about a way to have double Hashes - if there's such a thing - so that I could just look up Judgements
using the Document
and Query
I already have.
Or is there any other way of quickly finding out whether a Judgement exists for a given pair of Document and Query?
Well, if you need performance, you can always create another data structure to facilitate indexing - in your case you can write a hash where keys would be [document, query]
pairs, and values arrays of judgments
. Depending on the architecture of your app, you can either update this index upon every change in your objects, or build indexes from scratch whenever you need to perform a batch of lookups.
Or, perhaps, you should leave it to the database to do your lookups instead, of course, if you have a database at all.
This
@judgements.each { |j| return true if j.document == document and j.query == query }
could be written as
@judgements.any? { |j| j.document == document and j.query == query }
I agree with Mladen Jablanović in that that's a good chance that you should let your database handle this. In MongoDB it would be something like this
db = Mongo::Connection.new.db("mydb")
judgements = db.collection("judgements")
judgement = {:judgement_no=> "2011:73", :document => 4711, :query => 42}
judgements.add(judgement)
judgements.create_index([['document', Mongo::ASCENDING], ['query', Mongo::ASCENDING]])
judgements.find({:document => 4711, :query => 42}).each { |jm| puts jm }
精彩评论