python multithreading synchronization

2023-04-01 19:32 问答作者：

I am having a synchronization problem while threading with cPython. I have two files, I parse them and return the desired result. However, the code below acts strangely and returns three times instead of two plus doesn't return in the order I put them into queue. Here's the code:

import Queue
import threading
from HtmlDoc import Document

OUT_LIST = []

class Threader(threading.Thread):
    """
    Start threading
    """
    def __init__(self, queue, out_queue):
        threading.Thread.__init__(self)
        self.queue = queue
        self.out_q开发者_JS百科ueue = out_queue


    def run(self):
        while True:
            if self.queue.qsize() == 0: break

            path, host = self.queue.get()

            f = open(path, "r")
            source = f.read()
            f.close()

            self.out_queue.put((source, host))           
            self.queue.task_done()



class Processor(threading.Thread):
    """
    Process threading
    """
    def __init__(self, out_queue):
        self.out_queue = out_queue
        self.l_first = []
        self.f_append = self.l_first.append
        self.l_second = []
        self.s_append = self.l_second.append
        threading.Thread.__init__(self)


    def first(self, doc):
        # some code to to retrieve the text desired, this works 100% I tested it manually

    def second(self, doc):
        # some code to to retrieve the text desired, this works 100% I tested it manually

    def run(self):
        while True:
            if self.out_queue.qsize() == 0: break

            doc, host = self.out_queue.get()

            if host == "first":
                self.first(doc)
            elif host == "second":
                self.second(doc)

            OUT_LIST.extend(self.l_first + self.l_second)

            self.out_queue.task_done()


def main():

    queue = Queue.Queue()
    out_queue = Queue.Queue()

    queue.put(("...first.html", "first"))
    queue.put(("...second.html", "second"))

    qsize = queue.qsize()

    for i in range(qsize):
        t = Threader(queue, out_queue)
        t.setDaemon(True)
        t.start()

    for i in range(qsize):
        dt = Processor(out_queue)
        dt.setDaemon(True)
        dt.start()

    queue.join()
    out_queue.join()

    print '<br />'.join(OUT_LIST)

main()

Now, when I print, I'd like to print the content of the "first" first of all and then the content of the "second". Can anyone help me?

NOTE: I am threading because actually I will have to connect more than 10 places at a time and retrieve its results. I believe that threading is the most appropriate way to accomplish such a task

I am threading because actually I will have to connect more than 10 places at a time and retrieve its results. I believe that threading is the most appropriate way to accomplish such a task

Threading is actually one of the most error-prone ways to manage multiple concurrent connections. A more powerful, more debuggable approach is to use event-driven asynchronous networking, such as implemented by Twisted. If you're interested in using this model, you might want to check out this introduction.

I dont share the same opinion that threading is the best way to do this (IMO some events/select mechanism would be better) but problem with your code could be in variables t and dt. You have the assignements in the cycle and object instances are to stored anywhere - so it may be possible that your new instance of Thread/Processor get deleted at the end of the each cycle.

It would be more clarified if you show us precise output of this code.

1) You cannot control order of job completion. It depends on execution time, so to return results as you want you can create global dictionary with job objects, like job_results : {'first' : None, 'second' : None} and store results here, than you can fetch data on desired order

2) self.first and self.second should be cleared after each processed doc, else you will have duplicates in OUT_LIST

3) You may use multi-processing with subprocess module and put all result data to CSV files for example and them sort them as you wish.

继续阅读：multithreading python synchronization

python multithreading synchronization

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？