开发者

Python: for loop in index assignment

While working through the awesome book "Programming Collective Intelligence", by Toby Segaran, I've encountered some techniques in index assignments I'm not entirely familiar with.

Take this for example:

createkey='_'.join(sorted([str(wi) for wi in wordids]))

or:

normalizedscores = dict([(u,float(l)/maxscore) for (u,l) in linkscores.items()])

All the nested tuples in the indexes开发者_JAVA技巧 have me a bit confused. What is actually being assigned to these varibles? I assumed obviously the .join one comes out as a string, but what about the latter? If someone could explain the mechanics of these loops I'd really appreciate it. I assume these are pretty common techniques, but being new to Python, I suppose to ask is a moment's shame. Thanks!


[str(wi) for wi in wordids]

is a list comprehension.

a = [str(wi) for wi in wordids]

is the same as

a = []
for wi in wordids:
    a.append(str(wi))

So

createkey='_'.join(sorted([str(wi) for wi in wordids]))

creates a list of strings from each item in wordids, then sorts that list and joins it into a big string using _ as a separator.

As agf rightly noted, you can also use a generator expression, which looks just like a list comprehension but with parentheses instead of brackets. This avoids construction of a list if you don't need it later (except for iterating over it). And if you already have parentheses there like in this case with sorted(...) you can simply remove the brackets.

However, in this special case you won't be getting a performance benefit (in fact, it'll be about 10 % slower; I timed it) because sorted() will need to build a list anyway, but it looks a bit nicer:

createkey='_'.join(sorted(str(wi) for wi in wordids))

normalizedscores = dict([(u,float(l)/maxscore) for (u,l) in linkscores.items()])

iterates through the items of the dictionary linkscores, where each item is a key/value pair. It creates a list of key/l/maxscore tuples and then turns that list back into a dictionary.

However, since Python 2.7, you could also use dict comprehensions:

normalizedscores = {u:float(l)/maxscore for (u,l) in linkscores.items()}

Here's some timing data:

Python 3.2.2

>>> import timeit
>>> timeit.timeit(stmt="a = '_'.join(sorted([str(x) for x in n]))", setup="import random; n = [random.randint(0,1000) for i in range(100)]")
61.37724242267409
>>> timeit.timeit(stmt="a = '_'.join(sorted(str(x) for x in n))", setup="import random; n = [random.randint(0,1000) for i in range(100)]")
66.01814811313774

Python 2.7.2

>>> import timeit
>>> timeit.timeit(stmt="a = '_'.join(sorted([str(x) for x in n]))", setup="import random; n = [random.randint(0,1000) for i in range(100)]")
58.01728623923137
>>> timeit.timeit(stmt="a = '_'.join(sorted(str(x) for x in n))", setup="import random; n = [random.randint(0,1000) for i in range(100)]")
60.58927580777687


Let's take the first one:

  1. str(wi) for wi in wordids takes each element in wordids and converts it to string.
  2. sorted(...) sorts them (lexicographically).
  3. '_'.join(...) merges the sorted word ids into a single string with underscores between entries.

Now the second one:

normalizedscores = dict([(u,float(1)/maxscore) for (u,l) in linkscores.items()])
  1. linkscores is a dictionary (or a dictionary-like object).
  2. for (u,l) in linkscores.items() iterates over all entries in the dictionary, for each entry assigning the key and the value to u and l.
  3. (u,float(1)/maxscore) is a tuple, the first element of which is u and the second element is 1/maxscore (to me, this looks like it might be a typo: float(l)/maxscore would make more sense -- note the lowercase letter el in place of one).
  4. dict(...) constructs a dictionary from the list of tuples, where the first element of each tuple is taken as the key and the second is taken as the value.

In short, it makes a copy of the dictionary, preserving the keys and dividing each value by maxscore.


The latter is equivalent to:

normalizedscores = {}
for u, l in linkscores.items():
    normalizedscores[u] = float(l) / maxscore


[(u,float(1)/maxscore) for (u,l) in linkscores.items()]

This creates a list by iterating over the tuples in linkscores.items() and computing (u, float(l)/maxscore) for each tuple.

dict([this list])

creates a dict with entries from the result of the list comprehension - (u, float(l)/maxscore) for each item in linkscores.

As another example of creating a dict from a list of tuples:

>>> l = [(1,2), (3,4), (5,6)]
>>> d = dict(l)
>>> d
{1: 2, 3: 4, 5: 6}


Here is an example of the first...example

>>> wordids = [1,2,4,3,10,7]
>>> createkey='_'.join(sorted([str(wi) for wi in wordids]))
>>> print createkey
1_10_2_3_4_7

What it is doing is iterating through the list with a for loop, sorting the list, then joins all of the sorted values into a string, separating values with '_'


The weird-looking business happening inside the [] brackets is called a list comprehension, and it is basically a really concise way of building a list. myList = [str(wi) for wi in wordids] is equivalent to:

myList = []

for wi in wordids:
  myList.append(str(wi))

sorted() then sorts that list, and join() gives a string with those list items separated by underscores, like this: item1_item2_item3_....

The second assignment is more complicated/concise, but here's what's going on:

  • linkscores looks like a dictionary, and the items() method returns a list of (key, value) tuples from the dictionary. So for (u,l) in linkscores.items() is looping over that list.
  • For each of those tuples, we create a new tuple containing (u, float(l)/maxscore), and add it to the list. So this step basically changes your (item, value) list to a list of (item, normalized value) tuples.
  • The dict() function turns that back into a dictionary.

The overall result of this is to take all the values in the dict and normalize them. There may be an easier/more verbose way to do this, but this way has the benefit of looking cool. I prefer to not do crazy stuff with list comprehensions because it hurts readability, so don't feel bad at all if you don't feel like writing this kind of thing yourself!

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜