开发者

best practice for directory polling

I have to do batch processing to automate business process. I have to poll directory at regular interval to detect new files and do processing. While old files is being processed, new files can come in. For now, I use quartz scheduler and thread synchronization to ensure that only one thread can process files.

Part of the code are:

application-context.xml

<bean id="methodInvokingJob"
 开发者_运维问答 class="org.springframework.scheduling.quartz.MethodInvokingJobDetailFactoryBean"><br/>
  <property name="targetObject" ref="documentProcessor" /><br/>
  <property name="targetMethod" value="processDocuments" /><br/>
</bean>

DocumentProcessor

.....

public void processDocuments() { 
  LOG.info(Thread.currentThread().getName() + " attempt to run.");
  if (!processing) {
     synchronized (this) {
        try {
           processing = true;
           LOG.info(Thread.currentThread().getName() + " is processing");
           List<String> xmlDocuments = documentManager.getFileNamesFromFolder(incomingFolderPath);               
           // loop over the files and processed unlock files.
           for (String xmlDocument : xmlDocuments) {
              processDocument(xmlDocument);
           }
        }
        finally {
           processing = false;
        }
     }
  }
}

For the current code, I have to prevent other thread to process files when one thread is processing. Is that a good idea ? or we support multi-threaded processing. In that case how can I know which files is being process and which files has just arrived ? Any idea is really appreciated.


I would build it with these parts:

  1. Castle Transactions with TxF

  2. FileSystemWatcher JavaVersion

  3. TransactionScope (no java version unless you hack it a lot)

  4. A lock-free queue * (Paper discussing perf Java vs .Net, might be able to get source from them for Java) Java lock-based queues

    Such that:

When there's a new file, the file system watcher detects it (remember to put the correct flags, handle the error condition and set Enbled <- True and watch out for doubles), puts the file path in the queue.

You have an application thread, n worker threads. If this is the only app, they spin-wait on the queue, TryDequeue, otherwise they block on a monitor while(!Monitor.Enter(has_items)) ;

When a worker threads get a path through the de-queue operation, it starts working on it, and now no other thread can work on it. If there are doubles of output (depending on your setup), you can then use a file transaction as you are writing the output file. If the Commit operation fails, then you know another thread has already written the output file, and resume polling the queue.

  • Race condition, see: http://groups.google.com/group/lock-free/browse_thread/thread/c3b83466b27f6372


I'd do the following:

  • One thread that gets your filenames and adds them to a synchronized queue.

  • Multiple threads to do the actual reading: get an item from the synced queue and process it.

To check if a file is used you can simply try to rename/move it.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜