How to debug a database query for performance
I am absolutely new in databases and SQL开发者_Go百科 queries.
Q1. What are the common problems one generally face in database queries? Probably extremely large queries, performance issues, timeout and connectivity errors, import export and restoration errors
Q2. What approach and tools one should be aware of while debugging a SQL query performance?
It will be great if you can share some of the problems faced and how did you went forward for resolving it.
Try to use EXPLAIN before your queries, that will give you a great overview of what's going on under the hood.
Also see: SQL Performance Tips
Probably one of the most important things you will want to understand is indexes. One big potential bottleneck for query performance is when you are searching on a column that is not part of an index and it causes a table scan. A table scan means that every row is being scanned, so naturally the more records in the table(s), the worse the performance.
I'm not sure what database you are using, but from a SQL Server perspective you would be well served learning how to use SQL Profiler. You can also view the execution plan of a query through SQL Management Studio, this will point out where the performance problems in your query might be.
I debugged with the method shown here and one of the method worked for me.
I checked for the query running for the longest time and came to know that certain queries was stuck and was running since more than 3-4 hours. To check since how much time the query is running, run the following command:
SELECT max(now() - xact_start) FROM pg_stat_activity
WHERE state IN ('idle in transaction', 'active');
If this is more than an hour, than this is the issue. Kill the long running connection and limit the max age of the connection from application side.
The most important thing for debugging SQL Queries is SQL Server Profiler http://msdn.microsoft.com/en-us/library/ms187929.aspx
This will give you reads, writes, time spent, etc.
Execution plans are also very helpful and will show you whether it did a scan instead of a seek or vice versa. Look into these as well http://msdn.microsoft.com/en-us/library/ms178071.aspx
If you truly are interested in performance tuning database queries you need to get a big fat book on the subject for the database of your choice. Performance tuning is very database specific. That said, one commmon issue in tuning queries that you must be aware of no matter the database is that the tuned query should return exactly the same result set as the original query but be faster. Sometimes people tuning forget this basic thing and end up with a query that is faster but doesn't do the same thing.
Problems in database queries -
Wrong results - the query doesn't actually do what you want it to do
Performance - the query times out or is too slow. Many things can cause this but a couple of common causes are bad database design (do not use EAV tables for instance), use of poorly performing query techinques such as correlated subqueries, looping through records one at a time, lack of correct indexing, statistics out of date, queries that are not sargeable and of course hardware and network issues.
For Inserts,updates, deletes - there can be problems with data integrity due to poor database design (lack of foreign keys, lack of default values, lack of constraints) or query design. SOmetimes people choose to make thier own autoincrementing field instead of using whatever comes with the datbase. These can run into concurrency issues.
Security - database does not properly protect it's data (encryption of Personal data, coding to avoid injection attacks, limiting rights to do things to the dat, etc.)
Lack of testing of edge cases - this is a frequent problem in dynamic SQl and triggers especially when developers only consider the most common case and the thing breaks when the edge case hits it.
This is a good article on how to optimize your SQL statements and things you have take into account:
Problems that show up in databases are not usually related to the queries directly but problems I have had to address are missing indexes (can cause really slow queries on data), data integrity issues because constraints were not created which allowed someone to corrupt the data.
Additionally, if you are going to use transactions, make sure you understand how they will effect concurrent usage and know that you cannot select data from a table being modified in a transaction until the transaction is completed.
Connections and timeout are not problems I have typically faced.
As far as queries - use explain plans which allow you to identify how the database engine will run your queries. (This will identify how joins will be performed and in what order and the cardinality of your joins - if you can reduce the many to many relationships in your joins your query will run faster)
精彩评论