开发者

Forcing MySQL to use an non inclusive index to avoid table scan?

Background: Huge table that have field ID AUTO INCREMENT PRIMARY KEY, and another column timestamp that stores UNIX timestamp at the moment of the row insert operation. That timestamp column is not in any index and I cannot index it because of performance reasons.

Situation: We need to query this huge table for rows in past before a specific timestamp value; we could issue a SELECT statement and specify t开发者_如何学Pythonhat condition in the WHERE clause, but that would lead to full table scan because the timestamp column is not indexed.

Suggestion: The nature of those two columns is both increment with each row inserted: AUTO INCREMENT column increases and timestamp column increases as well. I can devide the table into two equal num of rows each time and check the timestamp at each bounds and so on till I reach one row, then run a normal SELECT against it using that ID.

Problem with this solution: It's hard and it needs around 25 queries to complete that operation, and the number goes up as the table grow.

So the question: can MySQL be instructed to perform that operation in atomic context?


I would try building a secondary table to your primary to at least act as a limiting basis of the query. Prime the table with something like... (and obviously, put indexe by the datebasis)

create table DailyStartKey as 
SELECT 
      DATE( FROM_UNIXTIME( YourTimeStampColumn ) ) AS DateBasis,
      min( YourAutoIncColumn ) as FirstPKForDay
   from 
      YourTable
   group by 
      1

Then, you could prequery this table to get the minimum PK (and possibly the max if looking for a date range) in your query. To keep it maintained without having to keep rebuilding it, I would add a trigger to your main table to try an insert into this DailyStartKey table if the date has NOT already been inserted yet.

Such query to utilize this might be...

select
      YourTable.*
   from 
      ( select FirstPKForDay
            from DailyStartKey
            where DateBasis = "2011-02-12" ) StartDate,
      ( select FirstPKForDay
            from DailyStartKey
            where DateBasis = "2011-02-25" ) LastDate,
      YourTable
   where
         YourTable.YourAutoIncColumn >= StartDate.FirstPKForDay
     and YourTable.YourAutoIncColumn <= LastDate.FirstPKForDay

EDIT to clarify another TRIGGER implementation.

To prevent the need to keep querying against the "DailyStartKey" table, you could create another table which is always and only one record with the last day the entry was created for. (or day/hour depending on the granularity you want).

Then, in your trigger to see if a new entry is needed, just

Select * from LastDateEntryTable where LastDate = CurrentDate

If one is already found, ignore.. otherwise, the trigger should do an insert into the daily key table and update the "LastDateEntryTable".


Improvising on the idea of Drapp: Create a help table and a trigger that adds 1 row into this table for every 100 in the big one. This table would be equivalent to the following view but it will have an index on datetimeBasis.

CREATE VIEW HundredRowsStartKey AS
  SELECT 
      YourTimeStampColumn AS datetimeBasis,
      YourAutoIncColumn AS id
    FROM 
      YourTable
    WHERE
      YourAutoIncColumn % 100 = 0
;

Also added an improvement on the final query details, so it does a datetime scan on at most 200 rows of the big table. All the other matching rows and needed intermediate data are fetched using indexes:

  • 2 searches on the help table (HundredRowsStartKey) and
  • 3 range checks on the index of the big table.

Thus, the datetime range query:

SELECT *
  FROM 
    YourTable
  WHERE 
    YourTimeStampColumn BETWEEN "2011-02-12-01.00.23" 
                            AND "2011-03-15-12.00.00"
;

would become:

WITH starting AS
  SELECT
      max(id) AS startLow
    FROM
      HundredRowsStartKey h
    WHERE datetimeBasis <= "2011-02-12-01.00.23"
;

WITH ending AS
  SELECT
      max(id) AS endLow
    FROM
      HundredRowsStartKey h
    WHERE datetimeBasis <= "2011-03-15-12.00.00"
;

SELECT *
  FROM 
    YourTable
  WHERE
    -- these are guaranteed
    ( YourAutoIncColumn >= starting.startLow+100
    AND YourAutoIncColumn <= ending.endLow-1
    ) 
    -- and these 200 we have to filter
    OR
    ( ( YourAutoIncColumn BETWEEN starting.startLow 
                              AND starting.startLow+99
        OR
        YourAutoIncColumn BETWEEN ending.endLow 
                              AND ending.endLow+99
      )
      -- with the original filter
      AND
      ( YourTimeStampColumn BETWEEN "2011-02-12-01.00.23" 
                                AND "2011-03-15-12.00.00"
      )
    )
;

Minor edits may be needed to ensure that all edge cases are caught (like when starting datetime is before all recorded in the tables, etc.)


Have you considered partitioning your table based on date range? If you have the date range in your query & the table is partitioned the full table scan can be avoided. You could also partition your indexes too.

http://dev.mysql.com/doc/refman/5.1/en/partitioning-overview.html

http://dev.mysql.com/tech-resources/articles/mysql_5.1_partitions.html

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜