How to load existing db file to memory in Python sqlite3?
I have an existing sqlite3
db file, on which I need to make some extensive calculations. Doing the calculations from the file is painfully slow, and as the file is not large (~10 MB
), so there should be no problem to l开发者_JS百科oad it into memory.
Is there a Pythonic way to load the existing file into memory in order to speed up the calculations?
Here is the snippet that I wrote for my flask application:
import sqlite3
from io import StringIO
def init_sqlite_db(app):
# Read database to tempfile
con = sqlite3.connect(app.config['SQLITE_DATABASE'])
tempfile = StringIO()
for line in con.iterdump():
tempfile.write('%s\n' % line)
con.close()
tempfile.seek(0)
# Create a database in memory and import from tempfile
app.sqlite = sqlite3.connect(":memory:")
app.sqlite.cursor().executescript(tempfile.read())
app.sqlite.commit()
app.sqlite.row_factory = sqlite3.Row
What about sqlite3.Connection.backup(...)
? "This method makes a backup of a SQLite database even while it’s being accessed by other clients, or concurrently by the same connection." Availability: SQLite 3.6.11 or higher. New in version 3.7.
import sqlite3
source = sqlite3.connect('existing_db.db')
dest = sqlite3.connect(':memory:')
source.backup(dest)
sqlite3.Connection.iterdump
"[r]eturns an iterator to dump the database in an SQL text format. Useful when saving an in-memory database for later restoration. This function provides the same capabilities as the .dump
command in the sqlite3 shell."
Get such an iterator and dump the disk-based database into a memory-based one, and you're ready to compute. When the computation is done, just dump the other way around back to disk.
First you should try and find out what is causing the slowness you are observing. Are you writing to tables? Are your writes within large enough transactions so that you don't save needless temporary results to disk? Can you change writes to go to temporary tables (with pragma temp_store=memory
)? Can you live with pragma synchronous=off
?
I don't think this functionality is exposed in the Python module, but sqlite has a backup API that sounds like exactly what you are asking for: a way to copy from one database to another (either one of which may be an in-memory database) that works pretty much automatically without any user-visible enumeration of tables. (Maybe APSW exposes this?)
Another option is to create a ram disk (if you have sufficient control of the environment) and copy the file there.
if we must use a python wrapper,then there are no better solution than the two pass, read and write solution. but beginning with version 3.7.17, SQLite has the option of accessing disk content directly using memory-mapped I/O.sqlite mmap
if you want to use mmap,you have to use the C interface since no wrapper provide it.
and there is another hardware solution,the memory disk.then you have the convenient file IO and the speed of memory.
This has already been answered before, including code examples at In python, how can I load a sqlite db completely to memory before connecting to it?
You do not mention operating system, but one gotcha of Windows XP is that it defaults to a 10MB file cache, no matter how much memory you have. (This made sense in the days when systems came with 64MB etc). This message has several links:
http://marc.info/?l=sqlite-users&m=116743785223905&w=2
Here is a relatively simple way to read a SQLite db into memory. Depending upon your preferences with regard to manipulating data, you either use Pandas dataframe or write your table to a in-memory sqlite3 database. Similarly after manipulating your data you use the same df.to_sqlite approach to store your results back into a db table.
import sqlite3 as lite
from pandas.io.sql import read_sql
from sqlalchemy import create_engine
engine = create_engine('sqlite://')
c = engine.connect()
conmem = c.connection
con = lite.connect('ait.sqlite', isolation_level=None) #Here is the connection to <ait.sqlite> residing on disk
cur = con.cursor()
sqlx = 'SELECT * FROM Table'
df = read_sql(sqlx, con, coerce_float=True, params=None)
#Read SQLite table into a panda dataframe
df.to_sql(con=conmem, name='Table', if_exists='replace', flavor='sqlite')
With the solution of Cenk Alti, I always had a MemoryError with Python 3.7, when the process reached 500MB. Only with the use of the backup functionality of sqlite3 (mentioned by thinwybk), I was able to to load and save bigger SQLite databases. Also you can do the same with just 3 lines of code, both ways.
The answers of @thinwybk and Crooner are both excellent.
When you have multiple connections to :memory:
sqlite databases, for instance when using SQLAlchemy together with the source.backup(dest)
function, then you may end up in not placing the backup into the "right" memory DB.
This can be fixed using a proper connection string: https://stackoverflow.com/a/65429612/1617295 and does not involve any hack not use of undocumented features.
sqlite supports in-memory databases.
In python, you would use a :memory: database name for that.
Perhaps you could open two databases (one from the file, an empty one in-memory), migrate everything from the file database into memory, then use the in-memory database further to do calculations.
精彩评论