开发者

Updating CSV files automatically

I started learning Python recently (5 hrs ago). Here's my scenario.

I get mails every 4 hours from a remote measurement site with measurement values. The files are in *.csv format and the filenames are XX-2011-00001.csv and YY-2011-00001.csv. These are data of two instruments continuously running with different sampling intervals. The files are stored in local folders.

I want to develop a script that would read a file (example: XX-2011-00001.csv) and write a new csv file with same data. After 4 hours the script should run again and now read only the new file XX-2011-00002.csv and append this data to the new csv file created. I want to make this script run in an infinite loop, such that the script checks for new file and adds it to the CSV file.

The file contains ‘Date开发者_开发百科’, ‘Time’ and ‘value’ fields.

Can you please help me in telling the modules that I should look into for writing this script? If you have any examples I would be really thankful.


The csv module will help in reading/writing your files. You'll want to use an infinite loop with a sleep -- something like:

while True:
    process_new_file()     # does nothing if no new file
    time.sleep(60)

process_new_file will need to check for new files, which can be tricky -- you don't want to try using a file before it's finished being written to! Something like this should work:

def check_for_new_file(directory=INCOMING, files={}):
    for file in os.listdir(directory):
        if file in files:
            break
        size = os.stat(file)[stat.ST_SIZE]
        files[file] = (datetime.time.now(), size)
    now = datetime.time.now()
    for file, last_time, last_size in files.items():
        current_size = os.stat(file)[stat.ST_SIZE]
        if current_size != last_size:
            files[file] = (now, current_size)
            continue
        if now - last_time <= TIME_WITH_NO_WRITES:
            return file
    raise NoneReady()

Now that we have a function that will keep track of any files in the INCOMING directory, and return a filename when it's been dormant long enough to be reasonably sure it's complete, we need a function to actually process the file, then move it somewhere for safekeeping.

def process_new_file():
    try:
        filename = check_for_new_file()   # raises ValueError if no file ready
    except NoneReady:
        return
    in_file = open(filename, 'rb')
    csv_file_in = csv.reader(in_file)
    out_file = open(MASTER_CSV, 'rb+')
    csv_file_out = csv.writer(out_file)
    for row in csv_file_in:
        csv_file_out.write(row)
    csv_file_out.close()
    csv_file_in.close()
    shutil.move(filename, PROCESSED)

To put it all together, complete with imports and globals:

import os
import stat
import shutil

INCOMING = '/some/path/with/new/files/'
PROCESSED = '/some/path/for/processed/files/'
TIME_WITH_NO_WRITES = 600  # 10 minutes

def check_for_new_file(directory=INCOMING, files={}):
    for file in os.listdir(directory):
        if file in files:
            break
        size = os.stat(file)[stat.ST_SIZE]
        files[file] = (datetime.time.now(), size)
    now = datetime.time.now()
    for file, last_time, last_size in files.items():
        current_size = os.stat(file)[stat.ST_SIZE]
        if current_size != last_size:
            files[file] = (now, current_size)
            continue
        if now - last_time <= TIME_WITH_NO_WRITES:
            return file
    raise NoneReady()

def process_new_file():
    try:
        filename = check_for_new_file()   # raises ValueError if no file ready
    except NoneReady:
        return
    in_file = open(filename, 'rb')
    csv_file_in = csv.reader(in_file)
    out_file = open(MASTER_CSV, 'rb+')
    csv_file_out = csv.writer(out_file)
    for row in csv_file_in:
        csv_file_out.write(row)
    csv_file_out.close()
    csv_file_in.close()
    shutil.move(filename, PROCESSED)

if __name__ == '__main__':
    while True:
        process_new_file()     # does nothing if no new file
        time.sleep(60)

This code is currently untested, so there may be a bug or two in it, and if there is an error somewhere it will stop running. Hopefully this will help get you going.


As others have said, the csv package contains great objects to handle the file I/O without writing a lot of low-level code.

However, I would implement the time requirement using a cron job rather than sleeping your application, if it's available. It'll be more flexible, and it won't be susceptible to to a single unexpected crash that stops your application if you aren't watching it.


There is a csv module that will help you. And you'll probably want to look into time.sleep(), though there are better ways to deal with that (but given how new to the language you are, time.sleep() is probably a good starting point).


You don't need any external modules to read/write to files, but importing the csv module may be advantageous to you depending on how you want to use your data. Check http://docs.python.org/tutorial/inputoutput.html#reading-and-writing-files for information on this. Essentially what you are going to want to do is run a "while (1):" as the main section of the program. This will execute indefinitely until you force the program to quit or encounters an error. You can use try/except blocks to gracefully exit, but that is beyond the scope of what you are asking.

I'm assuming the naming scheme of your csv files is something that can be determined algorithmically (since it appears to be just a date and a number). Your loop should either be checking for what the next value would be or it should be looking for the largest number value as the file name. In that case you would need to save the previous value of the file name and only execute your code if the value changes from the previous one stored.

For information on reading/writing to csv's using the csv module check out http://docs.python.org/library/csv.html

Edit: Forgot about the time delay. This was answered in the previous response. Use the time module and run time.sleep(x) where x is the time in seconds for the program to sleep in between iterations of the main loop.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜