sybase - fails to use index unless string is hard-coded
I'm using Sybase 12.5.3 (ASE); I'm new to Sybase though I've worked with MSSQL pretty extensively. I'm running into a scenario where a stored procedure is really very slow. I've traced the issue to a single SELECT stmt for a relatively large table. Modifying that statement dramatically improves the performance of the procedure (and reverting it drastically slows it down; i.e., the SELECT stmt is definitely the culprit).
-- Sybase optimizes and uses multi-开发者_运维知识库column index... fast!<br>
SELECT ID,status,dateTime
FROM myTable
WHERE status in ('NEW','SENT')
ORDER BY ID
-- Sybase does not use index and does very slow table scan<br>
SELECT ID,status,dateTime
FROM myTable
WHERE status in (select status from allowableStatusValues)
ORDER BY ID
The code above is an adapted/simplified version of the actual code. Note that I've already tried recompiling the procedure, updating statistics, etc.
I have no idea why Sybase ASE would choose an index only when strings are hard-coded and choose a table scan when choosing from another table. Someone please give me a clue, and thank you in advance.
1.The issue here is poor coding. In your release, poor code and poor table design are the main reasons (98%) the optimiser makes incorrect decisions (the two go hand-in-hand, I have not figured out the proportion of each). Both:
WHERE status IN ('NEW','SENT')
and
WHERE status IN (SELECT status FROM allowableStatusValues)
are substandard, because in both cases they cause ASE to create a worktable for the contents between the brackets, which can easily be avoided (and all consequential issues avoided with it). There is no possibility of statistics on a worktable, since the statistics on either t.status or s.status is missing (AdamH is correct re that point), it correctly chooses a table scan.
Subqueries have their place, but never as a substitute for a pure (the tables are related) join. The corrections are:
WHERE status = "NEW" OR status = "SENT"
and
FROM myTable t,
allowableStatusValues s
WHERE t.status = s.status
2.The statement
|Now you don't have to add an index to get statistics on a column, but it's probably the best way.
is incorrect. Never create Indices that you will not use. If you want statistics updated on a column, simply
UPDATE STATISTICS myTable (status)
3.It is important to ensure that you have current statistics on (a) all indexed columns and (b) all join columns.
4.Yes, there is no substitute for SHOWPLAN on every code segment that is intended for release, doubly so for any code with questionable performance. You can also SET NOEXEC ON
, to avoid execution, eg. for large result sets.
An index hint will work around it, but is probably not the solution.
Firstly I'd like to know if there is an index on allowableStatusValues.status, if there is then sybase will have stats on it and will have a good idea on the number of values in there. If not then the optimiser probably won't have a good idea how many different values Status may take. It's then having to make the assumption that you're going to be extracting almost all of the rows from myTable, and the best way of doing this is a table scan (if no covering index).
Now you don't have to add an index to get statistics on a column, but it's probably the best way.
If you do have an index on allowableStatusValues.status, then i'd wonder how good your stats are. Get yourself a copy of sp__optdiag. You probably also need to tune the values of "histogram tuning factor" and "number of histogram steps", increasing these slightly from the defaults will give you more detailed statistics which always helps the optimiser.
Does it still do a table scan if you replace the subquery with a join:
SELECT m.ID,m.status,m.dateTime
FROM myTable m
JOIN allowableStatusValues a on m.status = a.status
ORDER BY ID
Rather than relying on experimental observations of how long a query takes to run, I would highly recommend getting Sybase to show you the execution plans for each query, for example:
SET showplan ON
GO
-- query/procedure call goes here
SELECT id, status, datetime
FROM myTable
WHERE status IN('NEW','SENT')
ORDER BY id
GO
SET showplan OFF
GO
With SET showplan ON
, Sybase generates execution plans for every statement it executes. These can be invaluable in helping to identify where queries are not making use of appropriate indexes. For stored procedures in Sybase, the execution plan for the entire procedure is generated when the stored procedure is first executed after being compiled.
If you post the plans for each of your queries we might be able to shed more light on the problem.
Amazingly, using an index hint resolves the issue (see the (index myIndexName) line below - re-written/simplififed code below:
-- using INDEX HINT
SELECT ID,status,dateTime
FROM myTable (index myIndexName)
WHERE status in (select status from allowableStatusValues)
ORDER BY ID
Weird that I have to use this technique to avoid a table scan, but there ya go.
Garrett, by showing only the simplified code, you have likely stripped out exactly the information that would illuminate the source of the problem.
My first guess would be a type mismatch between allowableStatusValues.status and myTable.status. However, that is not the only possibility. As ninesided stated, the complete query plans (using showplan and fmtonly flags), as well as the actual table definitions and stored procedure source, is much more likely to produce a useful answer.
精彩评论