开发者

Simple asynchronous I/O: many threads, one file

I have a scientific application which I usually run in parallel with xargs, but this scheme incurs repeated JVM start costs and neglects cached file I/O and the JIT compiler. I've already adapted the code to use a thread pool, but I'm 开发者_如何学JAVAstuck on how to save my output.

The program (i.e. one thread of the new program) reads two files, does some processing and then prints the result to standard output. Currently, I've dealt with output by having each thread add its result string to a BlockingQueue. Another thread takes from the queue and writes to a file, as long as a Boolean flag is true. Then I awaitTermination and set the flag to false, triggering the file to close and the program to exit.

My solution seems a little kludgey; what is the simplest and best way to accomplish this? How should I write primary result data from many threads to a single file?

The answer doesn't need to be Java-specific if it is, for example, a broadly applicable method.

Update

I'm using "STOP" as the poison pill.

while (true) {
    String line = queue.take();
    if (line.equals("STOP")) {
        break;
    } else {
        output.write(line);
    }
}
output.close();

I manually start the queue-consuming thread, then add the jobs to the thread pool, wait for the jobs to finish and finally poison the queue and join the consumer thread.


That's really the way you want to do it, have the threads put their output to the queue and then have the writer exhaust it.

The only thing you might want to do to make things a little cleaner is rather than checking a flag, simply put an "all done" token on to the queue that the writer can use to know that it's finished. That way there's no out of band signaling necessary.

That's trivial to do, you can use an well known string, an enum, or simply a shared object.


You could use an ExecutorService. Submit a Callable that would perform the task and return the string after completion.

When Submitting the Callable you get hold of a Future, store these references e.g. in a List.

Then simply iterate through the Futures and get the Strings by calling Future#get. This will block until the task is completed if it not yet is, otherwise return the value immediately.

Example:

ExecutorService exec = Executors.newFixedThreadPool(10);
List<Future<String>> tasks = new ArrayList<Future<String>>();
tasks.add(exec.submit(new Callable<String> {
    public String call() {
       //do stuff
       return <yourString>;
    }
}));

//and so on for the other tasks

for (Future<String> task : tasks) {
    String result = task.get();
    //write to output
}


Many threads processing, one thread writing and a message queue between them is a good strategy. The issue that just needs to be solved, is knowing when all work is finished. One way to do that is to count how many worker threads you started, and then after that count how many responses you got. Something like this pseudo code:

int workers = 0
for each work item {
   workers++
   start the item's worker in a separate thread
}
while workers > 0 {
   take worker's response from a queue
   write response to file
   workers--
}

This approach also works if the workers can find more work items while they are executing. Just include any additional not-yet-processed work in the worker responses, and then increment the workers count and start workers threads as usual.

If each of the workers returns just one message, you can use Java's ExecutorService to execute Callable instances which return the result. ExecutorService's methods give access to Future instances from which you can get the result when the Callable has finished its work.

So you would first submit all the tasks to the ExecutorService and then loop over all the Futures and get their responses. That way you would write the responses in the order in which you check the futures, which can be different from the order in which they finish their work. If latency is not important, that shouldn't be a problem. Otherwise, a message queue (as mentioned above) might be more suitable.


It's not clear if your output file has some defined order or if you just dump your data there. I assume it has no order.

I don't see why you need an extra thread for writing to output. Just synchronized the method that writes to file and call it at the end of each thread.


If you have many threads writing to the same file the simplest thing to do is to write to that file in the task.

final PrintWriter out = 
ExecutorService es =
for(int i=0;i<tasks;i++)
    es.submit(new Runnable() {
        public void run() {
            performCalculations();
            // so only one thread can write to the file at a time.
            synchornized(out) {
                writeResults(out);
            }
        }
    });
 es.shutdown();
 es.awaitTermination(1, TimeUnit.HOUR);
 out.close();
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜