开发者

How do you efficiently trim an SQlite database down to a given file size?

I'm using SQLite 3.7.2 on Windows. My database is used to store log data which gets generated 开发者_开发知识库24/7. The schema is basically:

CREATE TABLE log_message(id INTEGER PRIMARY KEY AUTOINCREMENT, process_id INTEGER, text TEXT);
CREATE TABLE process(id INTEGER PRIMARY KEY AUTOINCREMENT, name TEXT);

The log_message.process_id field maps to process.id, thus associating each log message with the process it originates from.

Now, sooner or later the database becomes too large and I'd like to drop the oldest entries (those with the lowest log_message.id values) until the database fell to a given size again (say, 1GB). To do so, I'm currently doing

PRAGMA page_count;
PRAGMA page_size;

after each few log messages to get the size of the database. If it exceeds my limit, I just remove a fraction (right now: 100 messages) of the log messages like this:

BEGIN TRANSACTION;
DELETE FROM log_message WHERE id IN (SELECT id FROM log_message LIMIT 100);
DELETE FROM process WHERE id IN (SELECT id FROM PROCESS EXCEPT SELECT process_id FROM log_message);
COMMIT;
VACUUM;

The latter DELETE statement removes all unreferenced entries from the process table. I repeat this process until the file size is acceptable again.

This suffers from at least two issues:

  1. The approach of removing 100 log messages is quite random; I made that number up based on a few experiments. I'd like to know the number of entries I have to remove in advance.
  2. The repeated VACUUM calls can take up quite some time (the SQLite home page says that VACUUM can take up to half a second per MB on Linux, I guess it's not going to be any faster on Windows).

Does anybody have other suggestions on how to do this?


when you have a "right-sized" database then count the number of log_message rows.

SELECT COUNT(*) FROM LOG_MESSAGE

Store this number.

When you want shrink the file issue the count command again. Calculate the difference, delete that number of rows from your database, then VACCUM.

This can only be approximate but it will get you to near 1GB pretty quick. If you are still over you can go back to the 100 rows at a time method.


CREATE TABLE log_messages (
  integer id primary key, -- no autoincrement here
  datetime event_time,    -- for last id retrieval
  char(248) message       -- fixed field size
)

Let's assume and integer field is 4 bytes long, a datetime field is also 4 byes long, and each character is one byte long. Then each record is 256 bytes long, and your space limit is 1Kb. 4 records.

Initialize the table with consecutive ids

1 | 2011-05-01 23:00:01 | null
2 | 2011-05-01 23:00:01 | null
3 | 2011-05-01 23:00:01 | null
4 | 2011-05-01 23:00:01 | null

When your program start you run a query like:

SELECT id FROM log_messages ORDER BY event_time DESC LIMIT 1

The result of this query is 4, now you add 1, since the maximum number of records is also 4, 4 + 1 = 1, that's the id of the record you need to update.

UPDATE log_message SET message = "new message", event_time = NOW() WHERE id = 1

For the next record, you simply add 1 to the latest id you have in memory.

Hope you get the idea.


If you have required rights to FS, the best way, I think, would be to create a new log db and apply rotation of some kind to db files (deleting the oldest).


Divide the given maximum file size by the page size (as given by PRAGMA page_size) to get the maximum number of pages which the database may allocate. Set this value using PRAGMA max_page_count.

By doing this, INSERT statements will yield an SQLITE_FULL error whenever the maximum size is hit. Whenever this happens, do your DELETE routine to discard the oldest entries. After doing this, you can issue INSERT again until the database is full again. And so on.

This doesn't trim the database down to a given size, but doing that is quite inefficient anyway. Instead, it's better to impose a maximum size which must not be exceeded and then keep the database file at that size, so that SQlite can re-use the allocated disk space instead of having to grow or shrink the file.


Four years late and probably a penny short but did you ever consider having "Id" set in a range which you reset back to minimum once it hits the maximum and then instead of inserting and deleting records doing "updates" to the database.
I appreciate that you would have to store the last used "Id" number in the event of closing the program down, so that you could start at the correct point when re-starting but that seems relatively trivial.
Set up in this way you would be predefining the size of your database, as long as you used a fixed record size, by the number of records in the "Id" range.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜