How can I handle the time consuming SQL?

2023-01-13 04:21 问答作者：

We have a table with 6 million records, and then we have a SQL which need aroun开发者_Python百科d 7 minutes to query the result. I think the SQL cannot be optimized any more.

The query time causes our weblogic to throw the max stuck thread exception.

Is there any recommendation for me to handle this problem ?

Following is the query, but it's hard for me to change it,

SELECT * FROM  table1 
WHERE trim(StudentID) IN ('354354','0') 
AND concat(concat(substr(table1.LogDate,7,10),'/'),substr(table1.LogDate,1,5)) 
       BETWEEN '2009/02/02' AND '2009/03/02' 
AND TerminalType='1' 
AND RecStatus='0' ORDER BY StudentID, LogDate DESC, LogTime

However, I know it's time consuming for using strings to compare dates, but someone wrote before I can not change the table structure...

LogDate was defined as a string, and the format is mm/dd/yyyy, so we need to substring and concat it than we can use between ... and ... I think it's hard to optimize here.

The odds are that this query is doing a full-file scan, because you're WHERE conditions are unlikely to be able to take advantage of any indexes.

Is LogDate a date field or a text field? If it's a date field, then don't do the substr's and concat's. Just say "LogDate between '2009-02-02' and '2009-02-03' or whatever the date range is. If it's defined as a text field you should seriously consider redefining it to a date field. (If your date really is text and is written mm/dd/yyyy then your ORDER BY ... LOGDATE DESC is not going to give useful results if the dates span more than one year.)

Is it necessary to do the trim on StudentID? It is far better to clean up your data before putting it in the database then to try to clean it up every time you retrieve it.

If LogDate is defined as a date and you can trim studentid on input, then create indexes on one or both fields and the query time should fall dramatically.

Or if you want a quick and dirty solution, create an index on "trim(studentid)".

If that doesn't help, give us more info about your table layouts and indexes.

SELECT * ... WHERE trim(StudentID) IN ('354354','0')

If this is normal construct, then you need a function based index. Because without it you force the DB server to perform full table scan.

As a rule of thumb, you should avoid as much as possible use of functions in the WHERE clause. The trim(StundentID), substr(table1.LogDate,7,10) prevent DB servers from using any index or applying any optimization to the query. Try to use the native data types as much as possible e.g. DATE instead of VARCHAR for the LogDate. StudentID should be also managed properly in the client software by e.g. triming the data before INSERT/UPDATE.

If your database supports it, you might want to try a materialized view.

If not, it might be worth thinking about implementing something similar yourself, by having a scheduled job that runs a query that does the expensive trims and concats and refreshes a table with the results so that you can run a query against the better table and avoid the expensive stuff. Or use triggers to maintain such a table.

But the query time cause our weblogic to throw the max stuck thread exception.

If the query takes 7 minutes and cannot be made faster, you have to stop running this query real-time. Can you change your application to query a cached results table that you periodically refresh?

As an emergency stop-gap before that, you can implement a latch (in Java) that allows only one thread at a time to execute this query. A second thread would immediately fail with an error (instead of bringing the whole system down). That is probably not making users of this query happy, but at least it protects everyone else.

I updated the query, could you give me some advices ?

Those string manipulations make indexing pretty much impossible. Are you sure you cannot at least get rid of the "trim"? Is there really redundant whitespace in the actual data? If so, you could narrow down on just a single student_id, which should speed things up a lot.

You want a composite index on (student_id, log_date), and hopefully the complex log_date condition can still be resolved using a index range scan (for a given student id).

Without any further information about what kind of query you are executing and wheter you are using indexes or not it is hard to give any specific information.

But here are a few general tips.

Make sure you use indexes on the columns you often filter/order by.
If it is only a certain query that is way too slow, than perhaps you can prevent yourself from executing that query by automatically generating the results while the database changes. For example, instead of a count() you can usually keep a count stored somewhere.

Try to remove the trim() from the query by automatically calling trim() on your data before/while inserting it into the table. That way you can simply use an index to find the StudentID.

Also, the date filter should be possible natively in your database. Without knowing which database it might be more difficult, but something like this should probably work: LogDate BETWEEN '2009-02-02' AND '2009-02-02'

If you also add an index on all of these columns together (i.e. StudentID, LogDate, TerminalType, RecStatus and EmployeeID than it should be lightning fast.

Without knowing what database you are using and what is your table structure, its very difficult to suggest any improvement but queries can be improved by using indexes, hints, etc.

In your query the following part concat(concat(substr(table1.LogDate,7,10),'/'), substr(table1.LogDate,1,5)) BETWEEN '2009/02/02' AND '2009/02/02'

is too funny. BETWEEN '2009/02/02' AND '2009/02/02' ?? Man, what are yuu trying to do?

Can you post your table structure here?

And 6 million records is not a big thing anyway.

It is told a lot your problem is in date field. You definitely need to change your date from a string field to a native date type. If it is a legacy field that is used in your app in this exact way - you may still create a to_date(logdate, 'DD/MM/YYYY') function-based index that transforms your "string" date into a "date" date, and allows a fast already mentioned between search without modifying your table data.

This should speed things up a lot.

With the little information you have provided, my hunch is that the following clause gives us a clue:

     ... WHERE trim(StudentID) IN ('354354','0')

If you have large numbers of records with unidentified student (i.e. studentID=0) an index on studentID would be very imbalanced.

Of the 6 million records, how many have studentId=0?

Your main problem is that your query is treating everything as a string.

If LogDate is a Date WITHOUT a time component, you want something like the following

SELECT * FROM  table1 
WHERE  StudentID  IN (:SearchStudentId,0) 
AND  table1.LogDate = :SearchDate
AND TerminalType='1' 
AND RecStatus='0' 
ORDER BY EmployeeID, LogDate DESC, LogTime

If LogDate has a time component, and SearchDate does NOT have a time component, then something like this. (The .99999 will set the time to 1 second before midnight)

SELECT * FROM  table1 
WHERE  StudentID  IN (:SearchStudentId,:StudentId0) 
AND  table1.LogDate BETWEEN :SearchDate AND :SearchDate+0.99999 
AND TerminalType='1' 
AND RecStatus='0' 
ORDER BY EmployeeID, LogDate DESC, LogTime

Note the use of bind variables for the parameters that change between calls. It won't make the query much faster, but it is 'best practice'.

Depending on your calling language, you may need to add TO_DATE, etc, to cast the incoming bind variable into a Date type.

If StudentID is a char (usually the reason for using trim()) you may be able to get better performance by padding the variables instead of trimming the field, like this (assuming StudentID is a char(10)):

StudentID IN (lpad('354354',10),lpad('0',10))

This will allow the index on StudentID to be used, if one exists.

继续阅读：database oracle query-optimization

How can I handle the time consuming SQL?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？