开发者

Order of files downloaded by a multithreaded program is not constant

Im usin开发者_开发问答g the program from: here

to download many urls at once. It works fine, but the order of the urls in the queue that is received is not the same as their order in the urls list, and its also not constant (changes from run to run).

What can I do to either make their order constant or to know which url belongs to which index in the queue that is received.

Thanks.


Change fetch to read like this:

def fetch(url):
    return (url, urllib2.urlopen(url).read())

The, instead of a queue full of strings, each one containing a result, you get a queue full of tuples, each tuple containing the url, then a result.

You aren't going to be able to get back a queue in which things are always the same order because multithreading is not deterministic about stuff like that. So the best thing to do is make sure each thing is tagged so you can identify it later.


You can just add the index number to the URL...

urls = [
    (0, 'http://www.google.com/'),
    (1, 'http://www.lycos.com/'),
    (2, 'http://www.bing.com/'),
    (3, 'http://www.altavista.com/'),
    (4, 'http://achewood.com/'),
]

def fetch(index, url):
    data = urllib2.urlopen(url).read()
    # ... do whatever you need using index ...
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜