开发者

Which MYSQL select method?

I have 3 tables.

1st -> products 2nd -> labels 3rd -> connectionTable

开发者_高级运维

I want to select all products with their labels. I have two methods. I want to ask which one is more efficent.

1st way-> Two queries using while

$query = "SELECT * FROM products"; $result = mysql_query($query);

while($row = mysql_fetch_array($result, MYSQL_ASSOC))
{
    $query  = "SELECT * 
                 FROM connectionTable 
           INNER JOIN labels ON labels.labelID = connectionTable.labelID 
                WHERE productID = " . $row['labelID'];
..
..
} 

###################

2nd way-> using GROUP_CONCAT()

something like this:

SELECT GROUP_CONCAT(labelName) 
  FROM connectionTable 
INNER JOIN labels ON labels.labelID = connectionTable.labelID 
INNER JOIN products ON products.productID = connectionTable.productID 
     WHERE productID = " . $row['labelID'] . " GROUP BY productID;

$result = mysql_query($query);


neither approach is good. in both cases, you have a query inside a loop. that is not "two serial SQL queries", that is a query, and a second query that is run as many times as the number of rows in the first query.

what you should really be doing is adding the labels and connectionTable tables to the query outside of the loop.


Dump your query in phpMyAdmin and use EXPLAIN?

Other than that a JOIN will always be faster than a nested query.


You should be looking to do the JOIN and not the 2 queries separately and explain planning the 2 queries vs. the JOIN'ed won't tell you the whole story.

You have to remember when you execute a query, there a things going on outside of the actual query that take time and resources. Validating and parsing the SQL statement (which could be somewhat mitigated using bind variables if your version of MySQL supports them), determining the plan for retrieving the results, and network time/traffic especially if you're accessing a db on another host. If your first query returns a million rows, you're going to be executing the second query 1 million times and incuring the network overhead to send that across along with returning result set each time. This is far less efficient than sending a JOIN query once and returning the dataset as a whole and processing it. Not to mention what you are doing to the DB SQL cache without the use of bind variables.

Note that efficiency and response time aren't the same. What may be more efficient, may end up being slower from a users perspective. If a user hits the page with the 2 separate queries, he/she will most likely see results quite quickly as the individual queries in the loop execute and return small return sets that can be output to the page. To return all the rows though could take much longer than the single JOIN. In the single JOIN case, the user may wait longer before data is returned, but they will see the entirety of that data sooner.

I would go with the join and make sure you have indexes on the columns you are joining on (namely productID). A concatenated index on the label_id, label_name may help too depending on the table size etc., but this would be something you'll need to look at with EXPLAIN in order to verify. See how the response time is for your user(s) and work from there.


The first version is not running 2 queries, it's running 1 + number_of_products queries. If you are loading all products, it's easy:

  • Run SELECT * FROM products and create a map of products, so that you can access them by ID.
  • Run SELECT * FROM connectionTable JOIN labels ON labels.labelID = connectionTable.labelID, iterate over the results, look up the product in the map from the previous step and add the row to the product entry.

If you want to do this only for a limited set of products, select them, collect the product IDs and use the same query as before, but with WHERE productID IN (?, ?, ?, ...).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜