开发者

Performance optimization for implementation of relationship classes in Ruby

This one is related to my previous question on the performance of Arrays and Hashes in Ruby.

Prerequisites

I know that using Hashes to store lots of Objects leads to significant performance increase because of the O(1) lookup.

Now let's assume I had two classes, namely A and B and they can be related to each other, but only if a third class C exists (it's sort of a relationship class). To give a practical example, let's say I have Document, Query and the relationship class Judgement (this is from information retrieval, so basically the judgement tells you whether the document is relevant for a query or not).

Performance optimization for implementation of relationship classes in Ruby

(I hope I got this right)

The Problem

In most cases you'd like to find out how many Judgements there are for a combination of Doc开发者_高级运维ument and Query or if there are any.

In order to find out the latter, I'll iterate over each Jugdement ...

@judgements.each { |j| return true if j.document == document and j.query == query }

Now, this brings us back to a linear search again, which isn't that helpful.

How to solve it?

I was thinking about a way to have double Hashes - if there's such a thing - so that I could just look up Judgements using the Document and Query I already have.

Or is there any other way of quickly finding out whether a Judgement exists for a given pair of Document and Query?


Well, if you need performance, you can always create another data structure to facilitate indexing - in your case you can write a hash where keys would be [document, query] pairs, and values arrays of judgments. Depending on the architecture of your app, you can either update this index upon every change in your objects, or build indexes from scratch whenever you need to perform a batch of lookups.

Or, perhaps, you should leave it to the database to do your lookups instead, of course, if you have a database at all.


This

@judgements.each { |j| return true if j.document == document and j.query == query }

could be written as

@judgements.any? { |j| j.document == document and j.query == query }

I agree with Mladen Jablanović in that that's a good chance that you should let your database handle this. In MongoDB it would be something like this

db = Mongo::Connection.new.db("mydb")
judgements = db.collection("judgements")
judgement = {:judgement_no=> "2011:73", :document => 4711, :query => 42}
judgements.add(judgement)
judgements.create_index([['document', Mongo::ASCENDING], ['query', Mongo::ASCENDING]])

judgements.find({:document => 4711, :query => 42}).each { |jm| puts jm }
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜