avoiding code duplication in Python code

2023-03-02 14:04 问答作者：

Consider the following Python snippet:

af=open("a",'r')
bf=open("b", 'w')

for i, line in enumerate(af):
    if i < K:
        bf.write(line)

Now, suppose I want to handle the case where K is None, so the writing continues to the end of the file. I'm currently doing

if K is None:
    for i, line in enumerate(af):
        bf.write(line)
else:
    for i, line in enumerate(af):            
        bf.write(line)
        if i==K:
            break

This clearly isn't the best way to handle this, as I'm duplicating the code. Is there some more integrated way I can handle this? The natural thing would be to have the if/break code only be present if K is not None, but this involves writing syntax on the fly a la Lisp macros, which Python can't really do. Just to be clear, I'm not concerned about the particular case (which I choose partly for its simplicity), so much as learning about general techniques I may not be familar with.

UPDATE: After reading answers people have posted, and doing more experimentation, here are some more comments.

As said above, I was looking for general techniques that would be generalizable, and I think @Paul's answer,namely using takewhile from iterrools, fits that best. As a bonus, it is also much faster than the naive method i listed above; I'm not sure why. I'm not really familar with itertools, though I've looked at it a few times. From my perspective this is a case of functional programming For The Win! (Amusingly, the author of itertools once asked for feedback about dropping takewhile. See the thread beginning http://mail.python.org/pipermail/python-list/2007-December/522529.html.) I'd simplified my situation above, the actual situation is a bit more messy - I'm writing to two different files in the loop. So the code looks more like:

for i, line in enumerate(af):
    if i < K:
        bf.write(line)
        cf.write(line.split(',')[0].strip('"')+'\n')

Given my posted example, @Jeff reasonably suggested that in the case when K was None, I just copy the file. Since in practice I am looping anyway, doing so is not such a clear choi开发者_StackOverflowce. However, takewhile generalizes painlessly to this case. I also had another use case I did not mention here, and was able to use takewhile there too, which was nice. The second example looks like (verbatim)

i=0
for line in takewhile(illuminacond, af):
    line_split=line.split(',')
    pid=line_split[1][0:3]
    out = line_split[1] + ',' + line_split[2] + ',' + line_split[3][1] + line_split[3][3] + ',' \
                        + line_split[15] + ',' + line_split[9] + ',' + line_split[10]
    if pid!='cnv' and pid!='hCV' and pid!='cnv':
        i = i+1
        of.write(out.strip('"')+'\n')
        tf.write(line)

here I was able to use the condition

if K is None:
    illuminacond = lambda x: x.split(',')[0] != '[Controls]'
else:
    illuminacond = lambda x: x.split(',')[0] != '[Controls]' and i < K

per @Paul's original example. However, I'm not completely happy about the fact that I'm getting i from the outer scope, though the code works. Is there a better way of doing this? Or maybe it should be a separate question. Anyway, thanks to everyone who answered my question. Honorable mention to @Jeff, who made some nice suggestions.

for i, line in enumerate(af):  
    if K is None or i < K:
        bf.write(line)
    else:
        break

itertools.takewhile will apply your condition, and then break out of the loop the first time the condition fails.

from itertools import takewhile

if K is None:
    condition = lambda x: True
else:
    condition = lambda x: x[0] < K

for i,line in takewhile(condition, enumerate(af)):
    bf.write(line)

If K is None, then you don't want takewhile to ever stop, so the condition function should always return True. But if you are given a numeric value for K, then once the 0'th element of the tuple passed to the condition >= K, then takewhile will stop.

Whatever K is, it's always going to be less than infinity.

if K is None:
    K = float('inf') # infinity

for i, line in enumerate(af):            
    bf.write(line)
    if i==K:
        break

Or, setting K = -1 works just as well, though it's less semantically correct. Ideally you would set K = max lines in af, but I presume that data is not cheaply available.

If you must loop, how about this?

from sys import maxint

limit = K or maxint
for i, line in enumerate(af):
    if i >= limit: break
    bf.write(line)

Or even this?

from itertools import islice
from sys import maxint

bf.writelines(islice(af, K or maxint))

~~Why loop at all in the case that K is None?~~

from shutil import copyfile


aname = 'a'
bname = 'b'
if K is None:
    copyfile(aname, bname)
else:
    af = open(aname, 'r')
    bf = open(bname, 'w')
    for i, line in enumerate(af):
        if i < K:
            bf.write(line)

I think you're in a situation where you are going to have to accept a trade off between DRY principles and optimizations.

I would start by staying true to DRY principles and remove the duplicate code with a function like write_until...

def write_until(file_in,file_out,break_on)
    for i,line in enumerate(file_in)

        if break_on(i,line):
            break
        else:
            file_out.write(line)

af=open("a",'r')
bf=open("b", 'w')

if K is None:
    write_until(af,bf,lambda i,line: False)
else:
    write_until(af,bf,lambda i,line: i>K)

Then actually use the code and see if you really need to do optimizations. How much performance improvement will you honestly see from removing an if False check? If you really need that extra speed boost (which I doubt) then you'll just have to live with some code duplication.

继续阅读：code-duplication control-flow python

avoiding code duplication in Python code

更多精彩内容

精彩评论

最新问答

求几款适合日常出游佩戴的戒指？最好与众不同一点！？

2500千以内的家用投影仪推荐下?只要效果好,不要求啥子牌子？

向僵尸开炮流派技能怎么选?？

绝区零音擎怎么获取?？

绝经后怎么改善子宫已经萎缩的症状？

问答排行榜

Escaping "<" in Perl-generated XML

微信重新建群怎么建？

imessage会显示已读吗？

太快了能不能慢一点好爽~好大~不要拔出来了？

二年级家长回音怎么写大全简短的（二年级家长回音怎么写）？