Can this Python code be written more efficiently?

2023-01-09 18:11 问答作者：

So I have this code in python that writes some values to a Dictionary where each key is a student ID number and each value is a Class (of type student) where each Class has some variables associated with it. '

Code

        try:
                if ((str(i) in row_num_id.iterkeys()) and (row_num_id[str(i)]==varschosen[1])):
                        valuetowrite=str(row[i])
                        if students[str(variablekey)].var2 != []:
                                students[str(variablekey)].var2.append(valuetowrite)
                        else:
                                students[str(variablekey)].var2=([valuetowrite])
        except:
                two=1#This is just a dummy assignment because I #can't leave it empty... I don't need my program to do anything if the "try" doesn't work. I just want to prevent a crash.

        #Assign var3
        try:
                if ((str(i) in row_num_id.iterkeys()) and (row_num_id[str(i)]==varschosen[2])):
                        valuetowrite=str(row[i])
                        if students[str(variablekey)].var3 != []:
                                students[str(variablekey)].var3.append(valuetowrite)
                        else:
                                students[str(variablekey)].var3=([valuetowrite])
        except:
                two=1

        #Assign var4
        try:
                if ((str(i) in row_num_id.iterkeys()) and (row_num_id[str(i)]==varschosen[3])):
                        valuetowrite=str(row[i])
                        if students[str(variablekey)].var4 != []:
                                students[str(variablekey)].var4.append(valuetowrite)
                        else:
                                students[str(variablekey)].var4=([valuetowrite])
        except:
                two=1

The same code repeats many, many times for each variable that the student has (var5, var6,....varX). However, the RAM spike in my program comes up as I execute the function that does this series of variable assignments.

I wish to find out a way to make this more efficient in speed or more memory efficient because running this part of my program takes up around half a gig of memory. :(

Thanks for your help!

EDIT:

Okay let me simplify my question: In my case, I have a dictionary of about 6000 instantiated classes, where each class has 1000 attributed variables all of type string or list of strings. I don't really care about the number of lines my code is or the speed at which it runs (Right now, my code is at almost 20,000 lines and is about a 1 MB .py file!). What I am concerned about is the amount of memory it is taking up because this is the culprit in throttling my CPU. The ultimate question is: does the number of code lines by which I build up this massive dictionary matter so much in terms of RAM usage?

My original code functions fine, but the RAM usage is high. I'm not sure if that is "normal" with the amount of data I am co开发者_开发知识库llecting. Does writing the code in a condensed fashion (as shown by the people who helped me below) actually make a noticeable difference in the amount of RAM I am going to eat up? Sure there are X ways to build a dictionary, but does it even affect the RAM usage in this case?

Edit: The suggested code-refactoring below won't reduce the memory consumption very much. 6000 classes each with 1000 attributes may very well consume half a gig of memory.

You might be better off storing the data in a database and pulling out the data only as you need it via SQL queries. Or you might use shelve or marshal to dump some or all of the data to disk, where it can be read back in only when needed. A third option would be to use a numpy array of strings. The numpy array will hold the strings more compactly. (Python strings are objects with lots of methods which make them bulkier memory-wise. A numpy array of strings loses all those methods but requires relatively little memory overhead.) A fourth option might be to use PyTables.

And lastly (but not leastly), there might be ways to re-design your algorithm to be less memory intensive. We'd have to know more about your program and the problem it's trying to solve to give more concrete advice.

Original suggestion:

for v in ('var2','var3','var4'):
    try:
        if row_num_id.get(str(i))==varschosen[1]:
            valuetowrite=str(row[i])
            value=getattr(students[str(variablekey)],v)
            if value != []:
                value.append(valuetowrite)
            else:
                value=[valuetowrite]
    except PUT_AN_EXPLICT_EXCEPTION_HERE:
        pass

PUT_AN_EXPLICT_EXCEPTION_HERE should be replaced with something like AttributeError or TypeError, or ValueError, or maybe something else. It's hard to guess what to put here because I don't know what kind of values the variables might have.

If you run the code without the try...exception block, and your program crashes, take note of the traceback error message you receive. The last line will say something like

TypeError: ...

In that case, replace PUT_AN_EXPLICT_EXCEPTION_HERE with TypeError.

If your code can fail in a number of ways, say, with TypeError or ValueError, then you can replace PUT_AN_EXPLICT_EXCEPTION_HERE with (TypeError,ValueError) to catch both kinds of error.

Note: There is a little technical caveat that should be mentioned regarding row_num_id.get(str(i))==varschosen[1]. The expression row_num_id.get(str(i)) returns None if str(i) is not in row_num_id. But what if varschosen[1] is None and str(i) is not in row_num_id? Then the condition is True, when the longer original condition returned False.

If that is a possibility, then the solution is to use a sentinal default value like row_num_id.get(str(i),object())==varschosen[1]. Now row_num_id.get(str(i),object()) returns object() when str(i) is not in row_num_id. Since object() is a new instance of object there is no way it could equal varschosen[1].

You've spelled this wrong

two=1#This is just a dummy assignment because I 
#can't leave it empty... I don't need my program to do anything if the "try" doesn't work. I just want to prevent a crash.

It's spelled

pass

You should read a tutorial on Python.

Also,

except:

Is a bad policy. Your program will fail to crash when it's supposed to crash.

Names like var2 and var3 are evil. They are intentionally misleading.

Don't repeat str(variablekey) over and over again.

I wish to find out a way to make this more efficient in speed or more memory efficient because running this part of my program takes up around half a gig of memory. :(

This request is unanswerable because we don't know what it's supposed to do. Intentionally obscure names like var1 and var2 make it impossible to understand.

"6000 instantiated classes, where each class has 1000 attributed variables"

So. 6 million objects? That's a lot of memory. A real lot of memory.

What I am concerned about is the amount of memory it is taking up because this is the culprit in throttling my CPU

Really? Any evidence?

but the RAM usage is high

Compared with what? What's your basis for this claim?

Python dicts use a surprisingly large amount of memory. Try:

import sys
for i in range( 30 ):
   d = dict( ( j, j ) for j in range( i ) )
   print "dict with", i, "elements is", sys.getsizeof( d ), "bytes"

for an illustration of just how expensive they are. Note that this is just the size of the dict itself: it doesn't include the size of the keys or values stored in the dict.

By default, an instance of a Python class stores its attributes in a dict. Therefore, each of your 6000 instances is using a lot of memory just for that dict.

One way that you could save a lot of memory, provided that your instances all have the same set of attributes, is to use __slots__ (see http://docs.python.org/reference/datamodel.html#slots). For example:

class Foo( object ):
   __slots__ = ( 'a', 'b', 'c' )

Now, instances of class Foo have space allocated for precisely three attributes, a, b, and c, but no instance dict in which to store any other attributes. This uses only 4 bytes (on a 32-bit system) per attribute, as opposed to perhaps 15-20 bytes per attribute using a dict.

Another way in which you could be wasting memory, given that you have a lot of strings, is if you're storing multiple identical copies of the same string. Using the intern function (see http://docs.python.org/library/functions.html#intern) could help if this turns out to be a problem.

继续阅读：python

Can this Python code be written more efficiently?

Code

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？

Code

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生 新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？