Table Layout for perfomance (Multiple Query vs One single big query)
I have a straightforward question.
I am doing a web application that uses MySQL, and I am currently designing it. I just have a small question about performance.
I would like to know what is more efficient:
Scenario #1:
Table: Restaurant
-Name
-City
-Province
-Country
-Continent
sql =~ select * from restaurant where id = something.
or
Scenario #2:
Table: Restaurant
-Name
-City
Table: City
-Name
-Province
Table: Province
-Name
-Country
Table: Country
-Name
-Continent
Table: Continent
-Name
sql =~ [insert multiple sql queries that will output the name and the city,
with the corresponding province, country, and continent]
Logically, I think Scenario #1 is better (less query)开发者_如何学JAVA, but some people swore to me otherwise.
True, but the question is what option performs better. In that case, there is no doubt about it: option # 1 will perform better due to the query not having to JOIN with any other tables. Randolph does have a good point, whenever possible you should normalize your database structure.
If you are not experienced with database design, I'd suggest to always go with the normalized version. It's the right thing to do in most cases. You might want to denormalize your database in some cases, but then you should know exactly why are you doing that.
Note that in the second case it's not multiple queries. It's just one query, where all the tables are joined together. For example:
SELECT *
FROM restaurant
JOIN city ON city.id=restaurant.city
JOIN province ON province.id=city.province
...
Yes, it takes longer to write, but it's better than having inconsistent data in the database (maintaining a denormalized database is way harder). You can also use an ORM to do this kind of stuff for you.
The second option is a normalised structure, which means your data is less redundant, less chance for making errors, etc. I always vote for normalising data unless you're going to run into performance problems.
Incidentally, SELECT * FROM [Table]
isn't good practice anyway. You'll want to put in the column names.
Thank you guys for your input. "Normalized Database Design" was the key here. I googled it, speed-read it, and although it has a little bit less performance, the pros are really worth it.
Thanks again. (That was really fast btw!) http://en.wikipedia.org/wiki/Database_normalization
Wikipedia states that denormalized has a better performance, but I think I am just getting cocky and thinking I can handle a big denormalized database.
I'll stick with the less risky scenario. If shits hits the fan, I'll change hardware =).
Thanks again guys.
If you use the first scenario you get the problem of increased space use (for all the duplicate province, country, continent) and if you need to change the name of a city/country you need to change it in all rows where it's used.
For convenience I would use the second scenario. I don't think there will be big performance differences between the two scenarios (in the first scenario you only touch one table, but read back more data from disk, in the second scenario you read less data from the disk, but from multiple tables). It really depends on what kind of data you have there.
Edit: To explain my point above: if you keep all data in a large table then you need to actually read all the rows from the disk, even if much of the data read is the same (namely the city, province, country, continent). Even if the SQL caches data as it can it won't help here since it can't know that data from other rows is the same.
If you normalize the database and read from the restaurant table you will get ID's for the cities. Now if you have the same ID on multiple rows the SQL server will cache the data read for the city and won't hit the disk again, so it will be an increase in speed. This will be offset by the need to access a new table, but with correct indexing on the city ID that shouldn't be too much.
That's why I'm saying that with large databases the performance difference is not easy to assess and you'll be better off having a properly normalized DB.
And yes, if you use a normalized DB (second scenario) you can change the city name in one place since there will be a single row for a city. The same will work for the others (province, country, continent).
精彩评论