Java 7: Fork/Join Framework
Can开发者_如何学Go someone explain what Fork/Join is?
Fork Join is a new framework that has an easier to use API for a parallel, divide and conquer algorithm.
Say you have a long running task that, for this instance, has a complicated algorithm. You would want to fork the large tasks and now work on those two tasks. Now lets say that that those two tasks are still too big, you would fork each into two tasks (at this point there are four).
You would continue this until each task is at an acceptable size and invoke the algorithm. It is important to know the invocation of each task is done in parallel. When the task is completed it is joined with the other task it was forked with and consolidate the results.
This will continue until all tasks have been joined and one task is returned.
In addition to what was already said, fork/join utilizes work stealing - threads that run out of things to do can steal tasks from other threads that are still busy. And here is an example that can help you to understand how FJ can be used:
public class SumCounter extends RecursiveTask<Long> {
private final Node node;
public SumCounter(Node node) {
this.node = node;
}
@Override
protected Long compute() {
long sum = node.getValue();
List<ValueSumCounter> subTasks = new LinkedList<>();
for(Node child : node.getChildren()) {
SumCounter task = new SumCounter(child);
task.fork(); // run asynchronously
subTasks.add(task);
}
for(SumCounter task : subTasks) {
sum += task.join(); // wait for the result
}
return sum;
}
public static void main(String[] args) {
Node root = getRootNode();
new ForkJoinPool().invoke(new SumCounter(root));
}
}
Say you have a collection of things that need to be processed. You have a number of threads that can grab subsets of this collection and process them. They all do this concurrently (the fork part) then wait on the last one to finish (the join part) before returning.
I will answer what is Fork Join parallelism. This is one of the parallel design pattern widely used in many systems to achieve concurrency. I will explain this design pattern using a example.
For example lets say we have program which executes sequence of tasks :
A -> B -> C -> D. Here A,B,C,D are tasks.
- A takes 8 seconds
- B takes 4 seconds
- C takes 6 seconds
- D takes 7 seconds
So it will take total of 8+4+6+7 = 25 seconds for this program execution.
Now you found out that tasks A,B,C are independent and D is depend on A,B,C tasks' results. Now you may get a feeling that instead of waiting for A to finish we can start the execution of B simultaneously. Same for Task C can start the task simultaneously with A and B. What you can do is: You can invoke 3 new threads by your main thread and assign them A,B,C tasks and wait for the results before start the execution of task D. If your machine has multiple cores then these threads can run in parallel.
Now the execution time taken for the program is :
max(time_taken_A,_B,_C) + time_taken_D + threading_overhead_time
which is almost equal to = 8 + 7 + k = 15 + k;
In fork join parallelism we can offload tasks with a new thread only if these tasks are independent. Other wise we will face race conditions. If you have a program where one task is waiting for another task execution but this is not dependent on its results then you can offload these two tasks with new threads using fork join parallelism and you can get performance boost. But always think about the threading over head. If your tasks are very light weighted then using these parallel patterns will decrease your performance because of the thread creation, context switching overheads.
精彩评论