Python Overwriting files after parsing

2023-03-06 22:23 问答作者：

I'm new to Python, and I need to do a parsing exercise. I got a file, and I nee开发者_StackOverflow社区d to parse it (just the headers), but after the process, i need to keep the file the same format, the same extension, and at the same place in disk, but only with the differences of new headers..

I tried this code...

for line in open ('/home/name/db/str/dir/numbers/str.phy'):
    if line.startswith('ENS'):
        linepars = re.sub ('ENS([A-Z]+)0+([0-9]{6})','\\1\\2',line)
        print linepars

..and it does the job, but I don't know how to "overwrite" the file with the new parsing.

The easiest way, but not the most efficient (by far, and especially for long files) would be to rewrite the complete file.

You could do this by opening a second file handle and rewriting each line, except in the case of the header, you'd write the parsed header. For example,

fr = open('/home/name/db/str/dir/numbers/str.phy')
fw = open('/home/name/db/str/dir/numbers/str.phy.parsed', 'w') # Name this whatever makes sense

for line in fr:
    if line.startswith('ENS'):
        linepars = re.sub ('ENS([A-Z]+)0+([0-9]{6})','\\1\\2',line)
        fw.write(linepars)
    else:
        fw.write(line)

fw.close()
fr.close()

EDIT: Note that this does not use readlines(), so its more memory efficient. It also does not store every output line, but only one at a time, writing it to file immediately.

Just as a cool trick, you could use the with statement on the input file to avoid having to close it (Python 2.5+):

fw = open('/home/name/db/str/dir/numbers/str.phy.parsed', 'w') # Name this whatever makes sense

with open('/home/name/db/str/dir/numbers/str.phy') as fr:
    for line in fr:
        if line.startswith('ENS'):
            linepars = re.sub ('ENS([A-Z]+)0+([0-9]{6})','\\1\\2',line)
            fw.write(linepars)
        else:
             fw.write(line)

fw.close()

P.S. Welcome :-)

As others are saying here, you want to open a file and use that file object's .write() method.

The best approach would be to open an additional file for writing:

import os

current_cfg = open(...)
parsed_cfg  = open(..., 'w')
for line in current_cfg:
    new_line = parse(line)
    print new_line
    parsed.cfg.write(new_line + '\n')
current_cfg.close()
parsed_cfg.close()

os.rename(....) # Rename old file to backup name
os.rename(....) # Rename new file into place

Additionally I'd suggest looking at the tempfile module and use one of its methods for either naming your new file or opening/creating it. Personally I'd favor putting the new file in the same directory as the existing file to ensure that os.rename will work atomically (the configuration file named will be guaranteed to either point at the old file or the new file; in no case would it point at a partially written/copied file).

The following code DOES the job.
I mean it DOES overwrite the file ON ONESELF; that's what the OP asked for. That's possible because the transformations are only removing characters, so the file's pointer fo that writes is always BEHIND the file's pointer fi that reads.

import re

regx = re.compile('\AENS([A-Z]+)0+([0-9]{6})')

with open('bomo.phy','rb+') as fi, open('bomo.phy','rb+') as fo:
    fo.writelines(regx.sub('\\1\\2',line) for line in fi)

I think that the writing isn't performed by the operating system one line at a time but through a buffer. So several lines are read before a pool of transformed lines are written. That's what I think.

newlines = []
for line in open ('/home/name/db/str/dir/numbers/str.phy').readlines():
    if line.startswith('ENS'):
        linepars = re.sub ('ENS([A-Z]+)0+([0-9]{6})','\\1\\2',line)
        newlines.append( linepars )
open ('/home/name/db/str/dir/numbers/str.phy', 'w').write('\n'.join(newlines))

(sidenote: Of course if you are working with large files, you should be aware that the level of optimization required may depend on your situation. Python by nature is very non-lazily-evaluated. The following solution is not a good choice if you are parsing large files, such as database dumps or logs, but a few tweaks such as nesting the with clauses and using lazy generators or a line-by-line algorithm can allow O(1)-memory behavior.)

targetFile = '/home/name/db/str/dir/numbers/str.phy'

def replaceIfHeader(line):
    if line.startswith('ENS'):
        return re.sub('ENS([A-Z]+)0+([0-9]{6})','\\1\\2',line)
    else:
        return line

with open(targetFile, 'r') as f:
    newText = '\n'.join(replaceIfHeader(line) for line in f)

try:
    # make backup of targetFile
    with open(targetFile, 'w') as f:
        f.write(newText)
except:
    # error encountered, do something to inform user where backup of targetFile is

edit: thanks to Jeff for suggestion

继续阅读：file overwrite parsing python

Python Overwriting files after parsing

更多精彩内容

精彩评论

最新问答

鸣潮漂泊者小助理在哪里?？

电视墙装修用什么材料好?？

绝区零邦布池怎么解锁?？

想双11购买一台新冰箱，统帅、小米、澳柯玛这么多牌子应该如何选？？

张家界民宿价格大概多少钱一晚？？

问答排行榜

Escaping "<" in Perl-generated XML

微信重新建群怎么建？

imessage会显示已读吗？

太快了能不能慢一点好爽~好大~不要拔出来了？

二年级家长回音怎么写大全简短的（二年级家长回音怎么写）？