Python: How expensive is to create a small list many times?

2022-12-22 04:43 问答作者：

I encounter the following small annoying dilemma over and over again in Python:

Option 1:

cleaner but slower(?) if called many times since a_list get re-created for each call of do_something()

def do_something():    
  a_list = ["any", "think", "whatever开发者_如何学C"]    
  # read something from a_list

Option 2:

Uglier but more efficient (spare the a_list creation all over again)

a_list = ["any", "think", "whatever"]    
def do_something():    
  # read something from a_list

What do you think?

What's ugly about it?

Are the contents of the list always constants, as in your example? If so: recent versions of Python (since 2.4) will optimise that by evaluating the constant expression and keeping the result but only if it's a tuple. So you could change it to being a tuple. Or you could stop worrying about small things like that.

Here's a list of constants and a tuple of constants:

>>> def afunc():
...    a = ['foo', 'bar', 'zot']
...    b = ('oof', 'rab', 'toz')
...    return
...
>>> import dis; dis.dis(afunc)
  2           0 LOAD_CONST               1 ('foo')
              3 LOAD_CONST               2 ('bar')
              6 LOAD_CONST               3 ('zot')
              9 BUILD_LIST               3
             12 STORE_FAST               0 (a)

  3          15 LOAD_CONST               7 (('oof', 'rab', 'toz'))
             18 STORE_FAST               1 (b)

  4          21 LOAD_CONST               0 (None)
             24 RETURN_VALUE
>>>

Never create something more than once if you don't have to. This is a simply optimization that can be done on your part and I personally do not find the second example ugly at all.

Some may argue not to worry about optimizing little things like this but I feel that something this simple to fix should be done immediately. I would hate to see your application create multiple copies of anything that it doesn't need to simply to preserve an arbitrary sense of "code beauty". :)

Option 3:

def do_something(a_list = ("any", "think", "whatever")):
    read something from a_list

Option 3 compared to Option 1:

Both are equally readable in my opinion (though some seem to think differently in the comments! :-) ). You could even write Option 3 like this

def do_something(
    a_list = ("any", "think", "whatever")):
    read something from a_list

which really minimizes the difference in terms of readability. Unlike Option 1, however, Option 3 defines a_list only once -- at the time when do_something is defined. That's exactly what we want.

Option 3 compared to Option 2:

Avoid global variables if possible. Option 3 allows you to do that. Also, with Option 2, over time or if other people maintain this code, the definition of a_list could get separated from def do_something. This may not be a big deal, but I think it is somewhat undesireable.

if your a_list doesn't change, move it out of the function.

You have some data
You have a method associated with it
You don't want to keep the data globally just for the sake of optimising the speed of the method unless you have to.

I think this is what classes are for.

class Processor:
    def __init__(this):
        this.data = "any thing whatever".split()
    def fun(this,arg):
        # do stuff with arg and list

inst = Processor()
inst.fun("skippy)

Also, if you someday want to separate out the data into a file, you can just modify the constructor to do so.

Well it seems it comes down to initializing the array in the function or not:

import time
def fun1():
        a = ['any', 'think', 'whatever']
        sum = 0
        for i in range(100):
                sum += i

def fun2():
        sum = 0
        for i in range(100):
                sum += i


def test_fun(fun, times):
        start = time.time()
        for i in range(times):
                fun()
        end=time.time()
        print "Function took %s" % (end-start)

# Test
print 'warming up'
test_fun(fun1, 100)
test_fun(fun2, 100)

print 'Testing fun1'
test_fun(fun1, 100000)
print 'Testing fun2'
test_fun(fun2, 100000)

print 'Again'
print 'Testing fun1'
test_fun(fun1, 100000)
print 'Testing fun2'
test_fun(fun2, 100000)

and the results:

>python test.py
warming up
Function took 0.000604152679443
Function took 0.000600814819336
Testing fun1
Function took 0.597407817841
Testing fun2
Function took 0.580779075623
Again
Testing fun1
Function took 0.595198154449
Testing fun2
Function took 0.580571889877

Looks like there is no difference.

I've worked on automated systems that process 100,000,000+ records a day, where a 1% percent performance improvement is huge.

I learned a big lesson working on that system: Faster is better, but only when you know when it's fast enough.

A 1% improvement would have been a huge reduction in total processing time, but it isn't enough to effect when we would need our next hardware upgrade. My application was so fast, that the amount of time I spent trying to milk that last 1% probably cost more than a new server would have.

In your case, you would have to call do_something tens of thousands of times before making a significant difference in performance. In some cases that would make a difference, in other it won't.

If the list is never modified, why do you use lists at all?

Without knowing your actual requirements, I'd recommend to simply use some if-statements to get rid of the list and the "read something from list" part completely.

继续阅读：python

Python: How expensive is to create a small list many times?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？