开发者

Python Thread: can't start new thread

I'm trying to run this code:

def VideoHandler(id):
    try:
        cursor = conn.cursor()
        print "Doing {0}".format(id)
        data = urllib2.urlopen("http://myblogfms2.fxp.co.il/video" + str(id) + "/").read()
        title = re.search("<span class=\"style5\"><strong>([\\s\\S]+?)</strong></span>", data).group(1)
        picture = re.search("#4F9EFF;\"><img src=\"(.+?)\" width=\"120\" height=\"90\"", dat开发者_开发技巧a).group(1)
        link = re.search("flashvars=\"([\\s\\S]+?)\" width=\"612\"", data).group(1)
        id = id
        print "Done with {0}".format(id)
        cursor.execute("insert into videos (`title`, `picture`, `link`, `vid_id`) values('{0}', '{1}', '{2}', {3})".format(title, picture, link, id))
        print "Added {0} to the database".format(id)
    except:
        pass

x = 1
while True:
    if x != 945719:
        currentX = x
        thread.start_new_thread(VideoHandler, (currentX))
    else:
        break
    x += 1

and it says "can't start new thread"


The real reason for the error is most likely that you create way too many threads (more than 100k!!!) and hit an OS-level limit.

Your code can be improved in many ways besides this:

  • don't use the low level thread module, use the Thread class in the threading module.
  • join the threads at the end of your code
  • limit the number of threads you create to something reasonable: to process all elements, create a small number of threads and let each one process a subset of the whole data (this is what I propose below, but you could also adopt a producer-consumer pattern with worker threads getting their data from a queue.Queue instance)
  • and never, ever have a except: pass statement in your code. Or if you do, don't come crying here if your code does not work and you cannot figure out why. :-)

Here's a proposal:

from threading import Thread
import urllib2
import re

def VideoHandler(id_list):
    for id in id_list:
        try:
            cursor = conn.cursor()
            print "Doing {0}".format(id)
            data = urllib2.urlopen("http://myblogfms2.fxp.co.il/video" + str(id) + "/").read()
            title = re.search("<span class=\"style5\"><strong>([\\s\\S]+?)</strong></span>", data).group(1)
            picture = re.search("#4F9EFF;\"><img src=\"(.+?)\" width=\"120\" height=\"90\"", data).group(1)
            link = re.search("flashvars=\"([\\s\\S]+?)\" width=\"612\"", data).group(1)
            id = id
            print "Done with {0}".format(id)
            cursor.execute("insert into videos (`title`, `picture`, `link`, `vid_id`) values('{0}', '{1}', '{2}', {3})".format(title, picture, link, id))
            print "Added {0} to the database".format(id)
        except:
            import traceback
            traceback.print_exc()

conn = get_some_dbapi_connection()         
threads = []
nb_threads = 8
max_id = 945718
for i in range(nb_threads):
    id_range = range(i*max_id//nb_threads, (i+1)*max_id//nb_threads + 1)
    thread = Thread(target=VideoHandler, args=(id_range,))
    threads.append(thread)
    thread.start()

for thread in threads:
    thread.join() # wait for completion


os has a limit of the amount of threads. So you can't create too many threads over the limit. ThreadPool should be a good choice for you the do this high concurrency work.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜