开发者

is there a limit to the (CSV) filesize that a Python script can read/write?

I will be writing a little Python script tomorrow, to retrieve all the data from an old MS Access database into a CSV file first, and then after some data cleansing, munging etc, I will import the data into a mySQL database on Linux.

I intend to use pyodbc to make a connection to the MS Access db. I will be running the initial script in a Windows environment.

The db has IIRC well over half a million rows of data. My questions are:

  1. Is the number of records a cause for concern? (i.e. Will I hit some limits)?
  2. Is there a better file format for the transitory data (instead of CSV)?

I chose CSv because it is quite simple and straightforward (and I am a Python newbie) - but I would like to hear from someone who may have done somet开发者_C百科hing similar before.


Memory usage for csvfile.reader and csvfile.writer isn't proportional to the number of records, as long as you iterate correctly and don't try to load the whole file into memory. That's one reason the iterator protocol exists. Similarly, csvfile.writer writes directly to disk; it's not limited by available memory. You can process any number of records with these without memory limitations.

For simple data structures, CSV is fine. It's much easier to get fast, incremental access to CSV than more complicated formats like XML (tip: pulldom is painfully slow).


Yet another approach if you have Access available ...

Create a table in MySQL to hold the data.

In your Access db, create an ODBC link to the MySQL table.

Then execute a query such as:

INSERT INTO MySqlTable (field1, field2, field3)
SELECT field1, field2, field3
FROM AccessTable;

Note: This suggestion presumes you can do your data cleaning operations in Access before sending the data on to MySQL.


I wouldn't bother using an intermediate format. Pulling from Access via ADO and inserting right into MySQL really shouldn't be an issue.


The only limit should be operating system file size.

That said, make sure when you send the data to the new database, you're writing it a few records at a time; I've seen people do things where they try to load the entire file first, then write it.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜