Process two files at the same time in Python

2022-12-11 04:08 问答作者：

I have information about 12340 cars. This info is stored sequentially in two different files:

car_names.txt, which contains 开发者_运维百科one line for the name of each car
car_descriptions.txt, which contains the descriptions of each car. So 40 lines for each one, where the 6th line reads @CAR_NAME

I would like to do in python: to add for each car in the car_descriptions.txt file the name of each car (which comes from the other file) in the 7th line (it is empty), just after @CAR_NAME

I thought about:

1) read 1st file and store car names in a matrix/list 2) start to read 2nd file and each time it finds the string @CAR_NAME, just write the name on the next line

But I wonder if there is a faster approach, so the program reads each time one line from each file and makes the modification.

Thanks

First, make a generator that retrieves the car name from a sequence. You could yield every 7th line; I've made mine yield whatever line follows the line that starts with @CAR_NAME:

def car_names(seq):
    yieldnext=False
    for line in seq:
        if yieldnext: yield line
        yieldnext = line.startswith('@CAR_NAME')

Now you can use itertools.izip to go through both sequences in parallel:

from itertools import izip
with open(r'c:\temp\cars.txt') as f1:
    with open(r'c:\temp\car_names.txt') as f2:
        for (c1, c2) in izip(f1, car_names(f2)):
            print c1, c2

I'm not sure if I completely understand what you're trying to do, is something like this?

f1 = open ('car_names.txt')
f2 = open ('car_descriptions.txt')
for car_name in f1.readlines ():
        for i in range (6):   # echo the first 6 lines
                print f2.readline ()
        assert f2.readline() == '@CAR_NAME'  # skip the 7th, but assert that it is @CAR_NAME
        print car_name    # print the real car name
        for i in range (33):  # print the remaining 33 of the original 40
               print f2.readline ()

Reading car_names.txt will save you a piddling amount of memory (really really tiny by today's standards;-) but it absolutely won't be any faster than slurping it down at one gulp (best case it will be exactly the same speed, probably even a little bit slower unless your underlying operating system and storage system do a great job at read-lookahead caching / buffering). So I suggest:

import fileinput

carnames = open('car_names.txt').readlines()
carnamit = iter(carnames)

skip = False
for line in fileinput.input(['car_descriptions.txt'], True, '.bak'):
  if not skip:
    print line,
  if '@CAR_NAME' in line:
    print next(carnamit),
    skip = True
  else:
    skip = False

So measure the speed of this, and an alternative that does

carnamit = open('car_names.txt')

at the start instead of reading all lines over a list like my first version -- I bet that the first version (in as much as there's any measurable and repeatable difference) will prove to be faster.

BTW, the fileinput module of the standard library is documented here, and it's truly a convenient way to perform "virtual rewriting in-place" of text files (typically keeping the old version as a backup, just in case -- but even if the machine should crash in the middle of the operation the old version of the data will still be there, so in a sense the "rewriting" operates atomically with respect to machine crashes, a nice little touch;-).

for line1, line2 in zip(file(filename1), file(filename2)):
    # do your thing

or similar

12340 is not any data (in sense that there are much bigger data to process on the market).

Even better approach would use build in sqlite module. If not use some simple format like CSV for example. This is a structure organized. If not use threads, you could process two files simultaneously.

I think this fits the question:

it reads the description file one line at a time
when it sees @CAR_NAME, it still emits it, but replaces the next line in the description file with the next line from the names file


def merge_car_descriptions(namefile, descrfile):
    names = open(namefile,'r')
    descr = open(descrfile,'r')
    for d in descr:
        if '@CAR_NAME' in d:
            yield d + names.readline()
            descr.next()
        else:
            yield d

if __name__=='__main__':
    import sys
    if len(sys.argv) != 3:
        sys.exit("Syntax: %s car_names.txt car_descriptions.txt" % sys.argv[0])
    for l in merge_car_descriptions(sys.argv[1], sys.argv[2]):
        print l,

继续阅读：python string

Process two files at the same time in Python

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？