开发者

When optimizing database queries, what exactly is the relationship between number of queries and size of queries?

To optimize application speed, everyone always advises to minimize the number of queries an application makes to the database, consolidating them into fewer queries that retrieve more wherever possible.

However, this also always comes with the caution that data transferred is still data transferred, and just because you are making fewer queries doesn't make the data transferred free.

I'm in a situation where I can over-include on the query in order to cut down the number of queries, and simply remove the unwanted data in the application code.

Is there any type of a rule of thumb on how much of a cost there is to each query, to know when to optimize number of queries versus size of queries? I've tried to Google for objective performance analysis data, but surprisingly haven't been able to find anything like that.

Clearly this relationship will change for factors such as when the database grows in size, making this somewhat individualized, but surely this is not so individualized that a broad sense of 开发者_StackOverflowthe landscape can't be drawn out?

I'm looking for general answers, but for what it's worth, I'm running an application on Heroku.com, which means Ruby on Rails with a Postgres database.


I'm firmly in the "get only what you need when you need it" camp.

Retrieving extra rows that you may or may not need (lets say, retrieving full order details when loading an order summary screen, just in case the user drills down) just results in a much more complex query, probably joining tables that won't be used most of the time.

As a DBA, the hardest queries to optimize are the ones that join a large number of tables together.

Retrieving extra columns isn't quite as bad, but sometimes the server can retrieve just a few key columns directly from a "covering index" rather than having to retrieve all columns from a base table.

I think the key to the advice you've heard is to not make unnecessary round trips when you can get it all at once, instead of what it sounds like you are saying "get extra data just in case you need it".

Developers are so used to "modularizing" everything, its not at all unusual to actually end up with a final web page that makes hundreds or even multiple thousands of calls to the database to load the web page just once. We have a commercial product in-house that we've measured makes over 50,000 calls to the database for a single action.

For an example (somewhat contrived), lets say you have an "Order Summary" page that includes an "order total" field, which is the sum of all items in the "Order Detail" table. The wrong approach is:

  1. Retrieve the list of orders from the Order Header table
  2. Programatically loop through the orders
  3. For each order, execute a query to retrieve all order detail records
  4. Programatically add up the order items to get the total, which is displayed in the grid

Sounds crazy, right? This is more common than you think, especially when you build data-bound logic into individual web-components. Much more efficient:

  1. Make a single call to the database, which a query something like:

    SELECT oh.OrderID, oh.OrderDate, SUM(od.LineTotal) as OrderTotal
    FROM OrderHeader oh
    INNER JOIN OrderDetail od on oh.OrderID = od.OrderID
    
  2. Display the results in the grid.


If you're looking for a rule of thumb: whenever possible, filter, sort and page in the database query. The database is optimized for these kinds of operations (set operations).

Application code is best reserved for true business logic (and display logic, etc.).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜