Python's list comprehensions and other better practices
This relates to a project to convert a 2-way ANOVA program in SAS to Python.
I pretty much started trying to learn the language Thursday, so I know I have a lot of room for improvement. If I'm missing something blatantly obvious, by all means, let me know. I haven't got Sage up and running yet, nor numpy, so right now, this is all quite vanilla Python 2.6.1. (portable)
Primary query: Need a good set of list comprehensions that can extract the data in lists of samples in lists by factor A, by factor B, overall, and in groups of each level of factors A&B (AxB).
After some work, the data is in the following form (3 layers of nested lists):
response[a][b][n]
(meaning [a1 [b1 [n1, ... ,nN] ...[bB [n1, ...nN]]], ... ,[aA [b1 [n1, ... ,nN] ...[bB [n1, ...nN]]] Hopefully that's clear.)
Factor levels in my example case: A=3 (0-2), B=8 (0-7), N=8 (0-7)
byA= [[a[i] for i in range(b)] for a[b] in response]
(Can someone explain why this syntax works? I stumbled into it trying to see what the parser would accept. I haven't seen that syntax attached to that behavior elsewhere, but it's really nice. Any good links on sites or books on the topic would be appreciated. Edit: Persistence of variables between runs explained this oddity. It doesn't work.)
byB=lstcrunch([[Bs[i] for i in range(len(Bs)) ]for Bs in response])
(It bears noting that zip(*response)
almost does what I want. The above version isn't actually working, as I recall. I haven't run it through a careful test yet.)
byAxB= [item for sublist in response for item in sublist]
(Stolen from a response by Alex Martelli on this site. Again could someone explain why? List comprehension syntax is not very well explained in the texts I've been reading.)
ByO= [item for sublist in byAxB for item in sublist]
(Obviously, I simply reused the former comprehension here, 'cause it did what I need. Edit:)
I'd like these to end up the same datatypes, at least when looped through by the factor in question, s.t. that same average/sum/SS/et cetera functions can be applied and used.
This could easily be replaced by something cleaner:
def lstcrunch(Dlist):
"""Returns a list containing the entire
contents of whatever is imported,
reduced by one level.
If a rectangular array, it reduces a dimension by one.
lstcrunch(DataSet[a][b]) -> DataOutput[a]
[[1, 2], [[2, 3], [2, 4]]] -> [1, 2, [2, 3], [2, 4]]
"""
flat=[]
if islist(Dlist):#1D top level list
开发者_运维百科 for i in Dlist:
if islist(i):
flat+= i
else:
flat.append(i)
return flat
else:
return [Dlist]
Oh, while I'm on the topic, what's the preferred way of identifying a variable as a list? I have been using:
def islist(a):
"Returns 'True' if input is a list and 'False' otherwise"
return type(a)==type([])
Parting query: Is there a way to explicitly force a shallow copy to convert to a deep? copy? Or, similarly, when copying into a variable, is there a way of declaring that the assignment is supposed to replace the pointer, too, and not merely the value? (s.t.the assignment won't propagate to other shallow copies) Similarly, using that might be useful, as well, from time to time, so being able to control when it does or doesn't occur sounds really nice. (I really stepped all over myself when I prepared my table for inserting by calling: response=[[[0]*N]*B]*A )
Edit: Further investigation lead to most of this working fine. I've since made the class and tested it. it works fine. I'll leave the list comprehension forms intact for reference.
def byB(array_a_b_c):
y=range(len(array_a_b_c))
x=range(len(array_a_b_c[0]))
return [[array_a_b_c[i][j][k]
for k in range(len(array_a_b_c[0][0]))
for i in y]
for j in x]
def byA(array_a_b_c):
return [[repn for rowB in rowA for repn in rowB]
for rowA in array_a_b_c]
def byAxB(array_a_b_c):
return [rowB for rowA in array_a_b_c
for rowB in rowA]
def byO(array_a_b_c):
return [rep
for rowA in array_a_b_c
for rowB in rowA
for rep in rowB]
def gen3d(row, col, inner):
"""Produces a 3d nested array without any naughty shallow copies.
[row[col[inner]] named s.t. the outer can be split on, per lprn for easy display"""
return [[[k for k in range(inner)]
for i in range(col)]
for j in range(row)]
def lprn(X):
"""This prints a list by lines.
Not fancy, but works"""
if isiterable(X):
for line in X: print line
else:
print x
def isiterable(a):
return hasattr(a, "__iter__")
Thanks to everyone who responded. Already see a noticeable improvement in code quality due to improvements in my gnosis. Further thoughts are still appreciated, of course.
byAxB= [item for sublist in response for item in sublist]
Again could someone explain why?
I am sure A.M. will be able to give you a good explanation. Here is my stab at it while waiting for him to turn up.
I would approach this from left to right. Take these four words:
for sublist in response
I hope you can see the resemblance to a regular for
loop. These four words are doing the ground work for performing some action on each sublist
in response
. It appears that response
is a list of lists. In that case sublist
would be a list for each iteration through response
.
for item in sublist
This is again another for
loop in the making. Given that we first heard about sublist
in the previous "loop" this would indicate that we are now traversing through sublist, one item
at a time. If I were to write these loops out without comprehensions it would look like this:
for sublist in response:
for item in sublist:
Next, we look at the remaining words. [
, item
and ]
. This effectively means, collect items in a list and return the resulting list.
Whenever you have trouble creating or understanding list iterations write the relevant for
loops out and then compress them:
result = []
for sublist in response:
for item in sublist:
result.append(item)
This will compress to:
[
item
for sublist in response
for item in sublist
]
List comprehension syntax is not very well explained in the texts I've been reading
Dive Into Python has a section dedicated to list comprehensions. There is also this nice tutorial to read through.
Update
I forgot to say something. List comprehensions are another way of achieving what has been traditionally done using map
and filter
. It would be a good idea to understand how map
and filter
work if you want to improve your comprehension-fu.
For the copy part, look into the copy module, python simply uses references after the first object is created, so any change in other "copies" propagates back to the original, but the copy module makes real copies of objects and you can specify several copy modes
It is sometimes kinky to produce right level of recursion in your data structure, however I think in your case it should be relatively simple. To test it out while we are doing we need one sample data, say:
data = [ [a,
[b,
range(1,9)]]
for b in range(8)
for a in range(3)]
print 'Origin'
print(data)
print 'Flat'
## from this we see how to produce the c data flat
print([(a,b,c) for a,[b,c] in data])
print "Sum of data in third level = %f" % sum(point for point in c for a,[b,c] in data)
print "Sum of all data = %f" % sum(a+b+sum(c) for a,[b,c] in data)
for the type check, generally you should avoid it but if you must, as when you do not want to do recursion in string you can do it like this
if not isinstance(data, basestring) : ....
If you need to flatten structure you can find useful code in Python documentation (other way to express it is chain(*listOfLists))
and as list comprehension [ d for sublist in listOfLists for d in sublist ]
:
from itertools import flat.chain
def flatten(listOfLists):
"Flatten one level of nesting"
return chain.from_iterable(listOfLists)
This does not work though if you have data in different depths. For heavy weight flattener see: http://www.python.org/workshops/1994-11/flatten.py,
精彩评论