开发者

in-memory database in Python

I'm doing some queries in Python on a large database to get some stats out of the database. I want these stats to be in-memory so other programs can use them without going to a database.

I was thinking of how to structure them, and after trying to set up some complicated nested dic开发者_开发知识库tionaries, I realized that a good representation would be an SQL table. I don't want to store the data back into the persistent database, though. Are there any in-memory implementations of an SQL database that supports querying the data with SQL syntax?


SQLite3 might work. The Python interface does support the in-memory implementation that the SQLite3 C API offers.

From the spec:

You can also supply the special name :memory: to create a database in RAM.

It's also relatively cheap with transactions, depending on what you are doing. To get going, just:

import sqlite3
conn = sqlite3.connect(':memory:')

You can then proceed like you were using a regular database.

Depending on your data - if you can get by with key/value (strings, hashes, lists, sets, sorted sets, etc) - Redis might be another option to explore (as you mentioned that you wanted to share with other programs).


It may not seem obvious, but pandas has a lot of relational capabilities. See comparison with SQL


I guess, SQLite3 will be the best option then.

If possible, take a look at memcached. (for key-value pair, lighting fast!)

UPDATE 1:

HSQLDB for SQL Like tables. (no python support)


Extremely late to the party, but pyfilesystem2 (with which I am not affiliated) seems to be a perfect fit:

https://pyfilesystem2.readthedocs.io

pip install fs
from fs import open_fs
mem_fs = open_fs(u'mem://')
...


In-memory databases usually do not support memory paging option (for the whole database or certain tables), i,e, total size of the database should be smaller than the available physical memory or maximum shared memory size.

Depending on your application, data-access pattern, size of database and available system memory for database, you have a few choices:

a. Pickled Python Data in File System
It stores structured Python data structure (such as list of dictionaries/lists/tuples/sets, dictionary of lists/pandas dataframes/numpy series, etc.) in pickled format so that they could be used immediately and convienently upon unpickled. AFAIK, Python does not use file system as backing store for Python objects in memory implicitly but host operating system may swap out Python processes for higher priority processes. This is suitable for static data, having smaller memory size compared to available system memory. These pickled data could be copied to other computers, read by multiple dependent or independent processes in the same computer. The actual database file or memory size has higher overhead than size of the data. It is the fastest way to access the data as the data is in the same memory of the Python process, and without a query parsing step.

b. In-memory Database
It stores dynamic or static data in the memory. Possible in-memory libraries that with Python API binding are Redis, sqlite3, Berkeley Database, rqlite, etc. Different in-memory databases offer different features

  • Database may be locked in the physical memory so that it is not swapped to memory backing store by the host operating system. However the actual implementation for the same libray may vary across different operating systems.
  • The database may be served by a database server process.
  • The in-memory may be accessed by multiple dependent or independent processes.
  • Support full, partial or no ACID model.
  • In-memory database could be persistent to physical files so that it is available when the host operating is restarted.
  • Support snapshots or/and different database copies for backup or database management.
  • Support distributed database using master-slave, cluster models.
  • Support from simple key-value lookup to advanced query, filter, group functions (such as SQL, NoSQL)

c. Memory-map Database/Data Structure
It stores static or dynamic data which could be larger than physical memory of the host operating system. Python developers could use API such as mmap.mmap() numpy.memmap() to map certain files into process memory space. The files could be arranged into index and data so that data could be lookup/accessed via index lookup. This is actually the mechanism used by various database libraries. Python developers could implement custom techniques to access/update data efficiency.


You could possibly use a database like SQLite. It's not strictly speaking in memory, but it is fairly light and would be completely separate from your main database.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜