开发者

avoiding code duplication in Python code

Consider the following Python snippet:

af=open("a",'r')
bf=open("b", 'w')

for i, line in enumerate(af):
    if i < K:
        bf.write(line)

Now, suppose I want to handle the case where K is None, so the writing continues to the end of the file. I'm currently doing

if K is None:
    for i, line in enumerate(af):
        bf.write(line)
else:
    for i, line in enumerate(af):            
        bf.write(line)
        if i==K:
            break

This clearly isn't the best way to handle this, as I'm duplicating the code. Is there some more integrated way I can handle this? The natural thing would be to have the if/break code only be present if K is not None, but this involves writing syntax on the fly a la Lisp macros, which Python can't really do. Just to be clear, I'm not concerned about the particular case (which I choose partly for its simplicity), so much as learning about general techniques I may not be familar with.

UPDATE: After reading answers people have posted, and doing more experimentation, here are some more comments.

As said above, I was looking for general techniques that would be generalizable, and I think @Paul's answer,namely using takewhile from iterrools, fits that best. As a bonus, it is also much faster than the naive method i listed above; I'm not sure why. I'm not really familar with itertools, though I've looked at it a few times. From my perspective this is a case of functional programming For The Win! (Amusingly, the author of itertools once asked for feedback about dropping takewhile. See the thread beginning http://mail.python.org/pipermail/python-list/2007-December/522529.html.) I'd simplified my situation above, the actual situation is a bit more messy - I'm writing to two different files in the loop. So the code looks more like:

for i, line in enumerate(af):
    if i < K:
        bf.write(line)
        cf.write(line.split(',')[0].strip('"')+'\n')

Given my posted example, @Jeff reasonably suggested that in the case when K was None, I just copy the file. Since in practice I am looping anyway, doing so is not such a clear choi开发者_StackOverflowce. However, takewhile generalizes painlessly to this case. I also had another use case I did not mention here, and was able to use takewhile there too, which was nice. The second example looks like (verbatim)

i=0
for line in takewhile(illuminacond, af):
    line_split=line.split(',')
    pid=line_split[1][0:3]
    out = line_split[1] + ',' + line_split[2] + ',' + line_split[3][1] + line_split[3][3] + ',' \
                        + line_split[15] + ',' + line_split[9] + ',' + line_split[10]
    if pid!='cnv' and pid!='hCV' and pid!='cnv':
        i = i+1
        of.write(out.strip('"')+'\n')
        tf.write(line)

here I was able to use the condition

if K is None:
    illuminacond = lambda x: x.split(',')[0] != '[Controls]'
else:
    illuminacond = lambda x: x.split(',')[0] != '[Controls]' and i < K

per @Paul's original example. However, I'm not completely happy about the fact that I'm getting i from the outer scope, though the code works. Is there a better way of doing this? Or maybe it should be a separate question. Anyway, thanks to everyone who answered my question. Honorable mention to @Jeff, who made some nice suggestions.


for i, line in enumerate(af):  
    if K is None or i < K:
        bf.write(line)
    else:
        break


itertools.takewhile will apply your condition, and then break out of the loop the first time the condition fails.

from itertools import takewhile

if K is None:
    condition = lambda x: True
else:
    condition = lambda x: x[0] < K

for i,line in takewhile(condition, enumerate(af)):
    bf.write(line)

If K is None, then you don't want takewhile to ever stop, so the condition function should always return True. But if you are given a numeric value for K, then once the 0'th element of the tuple passed to the condition >= K, then takewhile will stop.


Whatever K is, it's always going to be less than infinity.

if K is None:
    K = float('inf') # infinity

for i, line in enumerate(af):            
    bf.write(line)
    if i==K:
        break

Or, setting K = -1 works just as well, though it's less semantically correct. Ideally you would set K = max lines in af, but I presume that data is not cheaply available.


If you must loop, how about this?

from sys import maxint

limit = K or maxint
for i, line in enumerate(af):
    if i >= limit: break
    bf.write(line)

Or even this?

from itertools import islice
from sys import maxint

bf.writelines(islice(af, K or maxint))

Why loop at all in the case that K is None?

from shutil import copyfile

aname = 'a' bname = 'b' if K is None: copyfile(aname, bname) else: af = open(aname, 'r') bf = open(bname, 'w') for i, line in enumerate(af): if i < K: bf.write(line)


I think you're in a situation where you are going to have to accept a trade off between DRY principles and optimizations.

I would start by staying true to DRY principles and remove the duplicate code with a function like write_until...

def write_until(file_in,file_out,break_on)
    for i,line in enumerate(file_in)

        if break_on(i,line):
            break
        else:
            file_out.write(line)

af=open("a",'r')
bf=open("b", 'w')

if K is None:
    write_until(af,bf,lambda i,line: False)
else:
    write_until(af,bf,lambda i,line: i>K)

Then actually use the code and see if you really need to do optimizations. How much performance improvement will you honestly see from removing an if False check? If you really need that extra speed boost (which I doubt) then you'll just have to live with some code duplication.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜