Slow Queries in SQL
I'm a database noobie when it comes to even moderately large data sets. I have a SQL database (multiple sql databases actually, a SQLite, Postgres, and MySQL database) all 开发者_StackOverflowcontaining the same data, dumped from IMDB. I want to benchmark these different databases. The main table I want to query has about 15 million rows. I want a query that crosses two movies, right now my query looks like this
SELECT * from acted_in INNER JOIN actors
ON acted_in.idactors = actors.idactors WHERE
(acted_in.idmovies = %d OR acted_in.idmovies = %d)
the parameters are randomly generated ids. I want to test the relative speed of the databases by running this query multiple times for randomly generated movies and seeing the amount of time it takes on average. My question is, is there any better way to do the same query, I want to join who acted in what with their information from either of the two movies as this will be the core functionality for the project I am working on, right now the speed is abysmal currently the average speed for a single query is
sqlite: 7.160171360969543
postgres: 8.263306670188904
mysql: 13.27652293920517
This is average time per query (sample space of only 100 queries but it is significant enough for now). So can I do any better? The current run time is completely unacceptable for any practical use. I don't think the joining takes a lot of time, by removing it I get nearly the same results so I believe the lookup is what is taking a long time, as I don't gain a significant speed up when I don't join or look up using the OR conditional.
The thing you don't mention here is having any indexes in the databases. Generally, the way you speed up a query (except for terribly written ones, which this is not) is by adding indexes to the things which are used in join or where criteria. This would slow down updates since the indexes need to be updated any time the table is updated, but would speed up selections using those attributes quite substantially. You may wish to consider adding indexes to any attributes you use which are not already primary keys. Be sure to use the same index type in all databases to be fair.
First off, microbenchmarks on databases are pretty noninformative and it's not a good idea to base your decision on them. There are dozens of better criteria to select a db, like reliability, behavior under heavy loads, availability of certain features (eg an extensible language like the PostGIS extension for postgres, partitioning, ...), license (!!), and so on.
Second, if you want to tune your database, or database server, there's a number of things you have to consider. Some important ones:
- db's like lots of memory and fast disks, so setup your server with ample quantities of both.
- use the query analysis features offered by all major db's (eg the very visual explain feature in pgadmin for postgres) to analyze the behavior of queries that are important for your use case, and adapt the db based on what you learn from these analyses (eg extra or other indexes)
- study to understand your db server well, these are pretty sophisticated programs with lots of settings that influence their behavior and performance
- make sure you understand the workload your db is subjected to, eg by using a tool like pgfouine for postgres, others exist for other brands of databases.
精彩评论