开发者

Do any databases support automatic/timed closing of databases?

We would like to implement an application architecture in which we have large numbers of databases on the disk, one database file for each 开发者_如何学运维customer.

When a user request comes in the database is opened (if it isn't already).

After a certain period of time with no activity, the database is closed automatically by the server, thereby freeing up database server resources.

With this architecture we should be able to have large numbers of databases on the disk, but only a subset of them loaded into the database server at any time.

The problem is that few databases seem to support the concept of automatically closing a database. It appears that maybe MIcrosoft SQL server allows this but we're using all open source technologies so SQL server is not an option.

We'll consider any free or open source database technology at all but I can't see any that support auto close functionality.

Anyone know any different?

UPDATE: looking for a solution based on Linux not Windows.

Thanks


Have you ascertained that this is really an issue? I only mention that because the cost of an open database is likely quite small, specifically "opening" most likely consists of syncing up any outstanding transactions waiting for the database, and doing a basic consistency check (notably loading a couple of pages of data stored on the disk).

Once that is done, without activity, there isn't really a lot of data to maintain on the server.

If you think about it, the most base functionality of a DB system is to manage the caching of database pages with memory. When a request is made for a piece of data, the system locates the actual page its on, and checks RAM to see if it's loaded. If not, it loads it from disk.

If you also notice, the vast bulk of DB "meta" data is stored -- in the database. That means that when the system wants to know anything, it effectively uses itself to locate the information, specifically the data page caching subsystem.

Like any other cache, as data is expired and no longer needed, it gets flush back out to disk and refetched when necessary.

So, this implies that once a database has been "opened", any information really necessary to maintain its state will likely be maintained via the data cache subsystem, and for unused database, released back to disk to make room for current traffic.

This is why I'm curious if you've tested your candidate DBs to see if you run in to issues about this, or if the database even has the concept of "opening a database".

When, as a client, we discuss this, the focus tends to be on connections to the database server. But once all of those are closed, I do not think that the system is going to retain any significant amount of in memory data about a particular database that is inactive.

After all, all (ALL) of the data in a database is stored "the same", a table is a table is a table, an index is an index is an index, particularly on central server where all of the data pages are managed as a single big "soup" of data.

The only issue you might run in to is if your database happens to create a file specifically for each database, and that file remains open. Eventually you may run out of file descriptors.

But most of the modern systems don't do that, they store everything in a big blob of files irrespective of what database or schema they're in (barring specific table space allocations that you make or the server allows, of course).

So, essentially, I don't think this is a problem, as I don't think the modern databases really make the kind of distinctions that you're talking about internally. That multiple databases or schema are a logical artifact within the system, not a technical implementation, and that all of the data pages end up in the same cache and use the same resources regardless of what schema, database, table, or index they come from.

I would do some testing on your database of choice to see if this is a problem. For example, you could create 1M databases, lower the memory as much as possible for the database, and then just starting cycling through them, opening however many at a time you feel is appropriate (10, 100, 1000, whatever) and see if you have any problems.

Finally, I don't "know" any of this for any specific database, it's just gut instinct on how historically databases are implemented.


I have this idea and assuming you are using Windows:

  1. Your database will be running as a service, and each client has his own unique service name.
  2. You write a batch file, that will start/stop that service.
  3. the batch file will be called from your server whenever you want.


I understand that you could have enough customers for a process to get short on file handles. How about a pool of DB connections?

When a user request arrives, see if that user's DB is opened. If so, use the connection and reset a time of last access flag

If that user's DB is not open, open the connection, set the last access time, and use the connection (if there's no available connection, throw an error). Also, fork a process/thread/lightweight process/whatever you call it in your environment that checks:

If the pool has an adequate number of unused connections, the thread is done

If not, scan for the oldest last accessed 5%-25%, or those not used in the last minute/hour/day (whatever is appropriate for your user request pattern) and close them, moving to the unused pool

Make sure you keep enough available connections in the unused pool to handle incoming requests.


mySql with a cron job.

In addition, mySql has a very small footprint (compared to Sql Server)... one example, is that it does not hog memory (and yes I know one can cap Sql Server memory usage).

mySql also has connection pooling which is very efficient and useful.


I assume that by "closing databases" you mean that they would free their cache memory? Because there really is no benefit to "closing" actual files on disk, their resource usage is negligible.

Some database engines use the operating system's disk cache. MySQL's MyISAM storage engine is an example, but doesn't offer integrity guarantees, so that excludes many of its uses. MySQL's other engines like InnoDB do not offer this, however.

PostgreSQL natively uses the operating system's cache as a second-level cache; while the first-level cache (shared_buffers) still consumes memory all the time, it's common to set it to only 10-25% of your memory even on performance-critical servers. The rest is free for OS-level caching, and will be allocated to the database when needed, and is available to other applications when they need it.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜