Forcing MySQL to use an non inclusive index to avoid table scan?
Background: Huge table that have field ID AUTO INCREMENT PRIMARY KEY, and another column timestamp that stores UNIX timestamp at the moment of the row insert operation. That timestamp column is not in any index and I cannot index it because of performance reasons.
Situation: We need to query this huge table for rows in past before a specific timestamp value; we could issue a SELECT
statement and specify t开发者_如何学Pythonhat condition in the WHERE
clause, but that would lead to full table scan because the timestamp column is not indexed.
Suggestion: The nature of those two columns is both increment with each row inserted: AUTO INCREMENT column increases and timestamp column increases as well. I can devide the table into two equal num of rows each time and check the timestamp at each bounds and so on till I reach one row, then run a normal SELECT
against it using that ID.
Problem with this solution: It's hard and it needs around 25 queries to complete that operation, and the number goes up as the table grow.
So the question: can MySQL be instructed to perform that operation in atomic context?
I would try building a secondary table to your primary to at least act as a limiting basis of the query. Prime the table with something like... (and obviously, put indexe by the datebasis)
create table DailyStartKey as
SELECT
DATE( FROM_UNIXTIME( YourTimeStampColumn ) ) AS DateBasis,
min( YourAutoIncColumn ) as FirstPKForDay
from
YourTable
group by
1
Then, you could prequery this table to get the minimum PK (and possibly the max if looking for a date range) in your query. To keep it maintained without having to keep rebuilding it, I would add a trigger to your main table to try an insert into this DailyStartKey table if the date has NOT already been inserted yet.
Such query to utilize this might be...
select
YourTable.*
from
( select FirstPKForDay
from DailyStartKey
where DateBasis = "2011-02-12" ) StartDate,
( select FirstPKForDay
from DailyStartKey
where DateBasis = "2011-02-25" ) LastDate,
YourTable
where
YourTable.YourAutoIncColumn >= StartDate.FirstPKForDay
and YourTable.YourAutoIncColumn <= LastDate.FirstPKForDay
EDIT to clarify another TRIGGER implementation.
To prevent the need to keep querying against the "DailyStartKey" table, you could create another table which is always and only one record with the last day the entry was created for. (or day/hour depending on the granularity you want).
Then, in your trigger to see if a new entry is needed, just
Select * from LastDateEntryTable where LastDate = CurrentDate
If one is already found, ignore.. otherwise, the trigger should do an insert into the daily key table and update the "LastDateEntryTable".
Improvising on the idea of Drapp: Create a help table and a trigger that adds 1 row into this table for every 100 in the big one. This table would be equivalent to the following view but it will have an index on datetimeBasis.
CREATE VIEW HundredRowsStartKey AS
SELECT
YourTimeStampColumn AS datetimeBasis,
YourAutoIncColumn AS id
FROM
YourTable
WHERE
YourAutoIncColumn % 100 = 0
;
Also added an improvement on the final query details, so it does a datetime scan on at most 200 rows of the big table. All the other matching rows and needed intermediate data are fetched using indexes:
- 2 searches on the help table (HundredRowsStartKey) and
- 3 range checks on the index of the big table.
Thus, the datetime range query:
SELECT *
FROM
YourTable
WHERE
YourTimeStampColumn BETWEEN "2011-02-12-01.00.23"
AND "2011-03-15-12.00.00"
;
would become:
WITH starting AS
SELECT
max(id) AS startLow
FROM
HundredRowsStartKey h
WHERE datetimeBasis <= "2011-02-12-01.00.23"
;
WITH ending AS
SELECT
max(id) AS endLow
FROM
HundredRowsStartKey h
WHERE datetimeBasis <= "2011-03-15-12.00.00"
;
SELECT *
FROM
YourTable
WHERE
-- these are guaranteed
( YourAutoIncColumn >= starting.startLow+100
AND YourAutoIncColumn <= ending.endLow-1
)
-- and these 200 we have to filter
OR
( ( YourAutoIncColumn BETWEEN starting.startLow
AND starting.startLow+99
OR
YourAutoIncColumn BETWEEN ending.endLow
AND ending.endLow+99
)
-- with the original filter
AND
( YourTimeStampColumn BETWEEN "2011-02-12-01.00.23"
AND "2011-03-15-12.00.00"
)
)
;
Minor edits may be needed to ensure that all edge cases are caught (like when starting datetime is before all recorded in the tables, etc.)
Have you considered partitioning your table based on date range? If you have the date range in your query & the table is partitioned the full table scan can be avoided. You could also partition your indexes too.
http://dev.mysql.com/doc/refman/5.1/en/partitioning-overview.html
http://dev.mysql.com/tech-resources/articles/mysql_5.1_partitions.html
精彩评论