How can I handle the time consuming SQL?
We have a table with 6 million records, and then we have a SQL which need aroun开发者_Python百科d 7 minutes to query the result. I think the SQL cannot be optimized any more.
The query time causes our weblogic to throw the max stuck thread exception.
Is there any recommendation for me to handle this problem ?
Following is the query, but it's hard for me to change it,
SELECT * FROM table1
WHERE trim(StudentID) IN ('354354','0')
AND concat(concat(substr(table1.LogDate,7,10),'/'),substr(table1.LogDate,1,5))
BETWEEN '2009/02/02' AND '2009/03/02'
AND TerminalType='1'
AND RecStatus='0' ORDER BY StudentID, LogDate DESC, LogTime
However, I know it's time consuming for using strings to compare dates, but someone wrote before I can not change the table structure...
LogDate was defined as a string, and the format is mm/dd/yyyy, so we need to substring and concat it than we can use between ... and ... I think it's hard to optimize here.
The odds are that this query is doing a full-file scan, because you're WHERE conditions are unlikely to be able to take advantage of any indexes.
Is LogDate a date field or a text field? If it's a date field, then don't do the substr's and concat's. Just say "LogDate between '2009-02-02' and '2009-02-03' or whatever the date range is. If it's defined as a text field you should seriously consider redefining it to a date field. (If your date really is text and is written mm/dd/yyyy then your ORDER BY ... LOGDATE DESC is not going to give useful results if the dates span more than one year.)
Is it necessary to do the trim on StudentID? It is far better to clean up your data before putting it in the database then to try to clean it up every time you retrieve it.
If LogDate is defined as a date and you can trim studentid on input, then create indexes on one or both fields and the query time should fall dramatically.
Or if you want a quick and dirty solution, create an index on "trim(studentid)".
If that doesn't help, give us more info about your table layouts and indexes.
SELECT * ... WHERE trim(StudentID) IN ('354354','0')
If this is normal construct, then you need a function based index. Because without it you force the DB server to perform full table scan.
As a rule of thumb, you should avoid as much as possible use of functions in the WHERE
clause. The trim(StundentID)
, substr(table1.LogDate,7,10)
prevent DB servers from using any index or applying any optimization to the query. Try to use the native data types as much as possible e.g. DATE
instead of VARCHAR
for the LogDate
. StudentID
should be also managed properly in the client software by e.g. triming the data before INSERT
/UPDATE
.
If your database supports it, you might want to try a materialized view.
If not, it might be worth thinking about implementing something similar yourself, by having a scheduled job that runs a query that does the expensive trims and concats and refreshes a table with the results so that you can run a query against the better table and avoid the expensive stuff. Or use triggers to maintain such a table.
But the query time cause our weblogic to throw the max stuck thread exception.
If the query takes 7 minutes and cannot be made faster, you have to stop running this query real-time. Can you change your application to query a cached results table that you periodically refresh?
As an emergency stop-gap before that, you can implement a latch (in Java) that allows only one thread at a time to execute this query. A second thread would immediately fail with an error (instead of bringing the whole system down). That is probably not making users of this query happy, but at least it protects everyone else.
I updated the query, could you give me some advices ?
Those string manipulations make indexing pretty much impossible. Are you sure you cannot at least get rid of the "trim"? Is there really redundant whitespace in the actual data? If so, you could narrow down on just a single student_id, which should speed things up a lot.
You want a composite index on (student_id, log_date), and hopefully the complex log_date condition can still be resolved using a index range scan (for a given student id).
Without any further information about what kind of query you are executing and wheter you are using indexes or not it is hard to give any specific information.
But here are a few general tips.
- Make sure you use indexes on the columns you often filter/order by.
- If it is only a certain query that is way too slow, than perhaps you can prevent yourself from executing that query by automatically generating the results while the database changes. For example, instead of a
count()
you can usually keep a count stored somewhere.
Try to remove the trim()
from the query by automatically calling trim()
on your data before/while inserting it into the table. That way you can simply use an index to find the StudentID
.
Also, the date
filter should be possible natively in your database. Without knowing which database it might be more difficult, but something like this should probably work: LogDate BETWEEN '2009-02-02' AND '2009-02-02'
If you also add an index on all of these columns together (i.e. StudentID
, LogDate
, TerminalType
, RecStatus
and EmployeeID
than it should be lightning fast.
Without knowing what database you are using and what is your table structure, its very difficult to suggest any improvement but queries can be improved by using indexes, hints, etc.
In your query the following part
concat(concat(substr(table1.LogDate,7,10),'/'), substr(table1.LogDate,1,5)) BETWEEN '2009/02/02' AND '2009/02/02'
is too funny. BETWEEN '2009/02/02' AND '2009/02/02' ?? Man, what are yuu trying to do?
Can you post your table structure here?
And 6 million records is not a big thing anyway.
It is told a lot your problem is in date field. You definitely need to change your date from a string field to a native date type. If it is a legacy field that is used in your app in this exact way - you may still create a to_date(logdate, 'DD/MM/YYYY')
function-based index that transforms your "string" date into a "date" date, and allows a fast already mentioned between
search without modifying your table data.
This should speed things up a lot.
With the little information you have provided, my hunch is that the following clause gives us a clue:
... WHERE trim(StudentID) IN ('354354','0')
If you have large numbers of records with unidentified student (i.e. studentID=0) an index on studentID would be very imbalanced.
Of the 6 million records, how many have studentId=0?
Your main problem is that your query is treating everything as a string.
If LogDate is a Date WITHOUT a time component, you want something like the following
SELECT * FROM table1
WHERE StudentID IN (:SearchStudentId,0)
AND table1.LogDate = :SearchDate
AND TerminalType='1'
AND RecStatus='0'
ORDER BY EmployeeID, LogDate DESC, LogTime
If LogDate has a time component, and SearchDate does NOT have a time component, then something like this. (The .99999 will set the time to 1 second before midnight)
SELECT * FROM table1
WHERE StudentID IN (:SearchStudentId,:StudentId0)
AND table1.LogDate BETWEEN :SearchDate AND :SearchDate+0.99999
AND TerminalType='1'
AND RecStatus='0'
ORDER BY EmployeeID, LogDate DESC, LogTime
Note the use of bind variables for the parameters that change between calls. It won't make the query much faster, but it is 'best practice'.
Depending on your calling language, you may need to add TO_DATE, etc, to cast the incoming bind variable into a Date type.
If StudentID is a char
(usually the reason for using trim()
) you may be able to get better performance by padding the variables instead of trimming the field, like this (assuming StudentID is a char(10)
):
StudentID IN (lpad('354354',10),lpad('0',10))
This will allow the index on StudentID to be used, if one exists.
精彩评论