Java + Threads: processing lines in parallel
I want to process a large number of independant lines in parallel. In the following code I'm creating a pool of NUM_THREAD Theads containing POOL_SIZE lines. Each thread is started and I then wait for each thread using 'join'.
I guess it is a bad practice as here, a finished Thread will have to wait for his siblings in the pool.
What would be the correct way to implement this code ? Which classes should I use ?
Thanks !
class FasterBin extends Thread
{
private List<String> dataRows=new ArrayList<String>();
private Object result=null;
@Override
public void run()
{
for(String s:dataRows)
{
//Process item here (....)
}
}
}
(...)
List<FasterBin> threads=new Vector<FasterBin>();
String line;
Iterator<String> iter=(...);
for(;;)
{
while(threads.size开发者_如何学Go()< NUM_THREAD)
{
FasterBin bin=new FasterBin();
while(
bin.dataRows.size() < POOL_SIZE &&
iter.hasNext()
)
{
nRow++;
bin.dataRows.add(iter.next());
}
if(bin.dataRows.isEmpty()) break;
threads.add(bin);
}
if(threads.isEmpty()) break;
for(FasterBin t:threads)
{
t.start();
}
for(FasterBin t:threads)
{
t.join();
}
for(FasterBin t:threads)
{
save(t.result);// ## do something with the result (save into a db etc...)
}
threads.clear();
}
finally
{
while(!threads.isEmpty())
{
FasterBin b=threads.remove(threads.size()-1);
try {
b.interrupt();
}
catch (Exception e)
{
}
}
}
Do NOT do all this by yourself! It is extremely hard to get 1) robust and 2) right.
Instead rewrite your stuff to create a lot of Runnables or Callables and use a suitable ExecutorService to get an Executor to process them with the behaviour you want.
Note that this stay inside the current JVM. If you have more than one JVM available (on multiple machines) I would recommend opening a new question.
java.util.concurrent.ThreadPoolExecutor.
ThreadPoolExecutor x=new ScheduledThreadPoolExecutor(10);
x.execute(runnable);
See this for an overview: Java API for util.concurrent
Direct use of Threads is actually discouraged - look at the package java.util.concurrent, you'll find there ThreadPools and Futures which should be used instead.
Thread.join doesn't mean that the Thread waits for others, it means your main Thread waits for one of the Thread in list to die. In this case your main Thread waits for the slowiest working Thread to finish. I don't see a problem with this approach.
Yes, in some sense, a finished Thread
would have to wait for his siblings in the pool: when a thread finishes, it stops, and does not help other threads to finish sooner. Better say, the whole work waits for the thread which works for the longest time.
This is because each thread has exactly one task. You better create many tasks, much more than the number of threads, and put them all in a single queue. Let all working threads take their tasks from that queue in a loop. Then the difference in time for all threads would be roughly the time to execute one task, which is small because tasks are small.
You can start the pool of working threads yourself, or you can wrap each task in a Runnable
and submit them to a standard thread pool - this makes no difference.
精彩评论