开发者

2 Select or 1 Join query?

I have 2 tables:

book ( id, title, age ) ----> 100 milions of rows

author ( id, book_id, name,开发者_如何学Python born ) ----> 10 millions of rows

Now, supposing I have a generic id of a book. I need to print this page:

Title: mybook

authors: Tom, Graham, Luis, Clarke, George

So... what is the best way to do this ?

1) Simple join like this:

Select book.title, author.name 
From book, author 
WHERE ( author.book_id = book.id ) AND ( book.id = 342 )

2) For avoid the join, I could make 2 simple query:

Select title FROM book WHERE id = 342

Select name FROM author WHERE book_id = 342 

What is the most efficient way ?


The first one. It's only a single round trip. It requires a little processing to collapse the rows of authors into a comma-separated list like you want but that's basically boilerplate code.

Separate related queries are a bad habit that will kill your performance faster than most things.


The best option is to run speed tests on your own server. Depending on how often the different tables are accessed together and apart, either one could be faster.

This has been answered in depth before: LEFT JOIN vs. multiple SELECT statements


The first one, and especially if you have an index on author.book_id. A clostered index would be best if you have many authors pr book and it's possible, else a non-clostered would also help you a lot.


Round trip minimization and promotion of sane execution plans are the most salient items on my performance list.

If you have a situation with static dependancies between fields in a query preventing the optimizer from using an index then breaking them out into separate queries may provide huge performance gains as indexes are used and row count of the dataset increases. For most database transport protocols additional result sets equal additional round trips. This can potentially have performance implications if data is regularly accessed over a WAN. Fortunatly there are ways to have your cake and eat it too:

Select title,NULL AS name FROM book WHERE id = 342 
UNION ALL
Select NULL,name FROM author WHERE book_id = 342 

In your specific example I would choose #1 with a warning to consider what would happen if there were no authors on file for a given book.


I know it shouldn't be a consideration, but the first query will return you a result set like this:

title     name
-----------------
mybook    Tom
mybook    Graham
mybook    Luis
mybook    Clarke
mybook    George

whereas the second pair will return you a pair of result sets like this:

title
-------
mybook

and

name
--------
Tom
Graham
Luis
Clarke
George

so each approach returns the data in a different way. In this simple example the repeating of the book's title isn't going to be significant, but if instead of the title you were returning the first chapter (say) then this would less efficient as there would be a lot of repeated data. So while the second might take longer in the database, it might be quicker and more efficient when sending that data across the network.

You need to test your actual results and see which one performs best.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜