best practice for directory polling
I have to do batch processing to automate business process. I have to poll directory at regular interval to detect new files and do processing. While old files is being processed, new files can come in. For now, I use quartz scheduler and thread synchronization to ensure that only one thread can process files.
Part of the code are:
application-context.xml
<bean id="methodInvokingJob"
开发者_运维问答 class="org.springframework.scheduling.quartz.MethodInvokingJobDetailFactoryBean"><br/>
<property name="targetObject" ref="documentProcessor" /><br/>
<property name="targetMethod" value="processDocuments" /><br/>
</bean>
DocumentProcessor
.....public void processDocuments() {
LOG.info(Thread.currentThread().getName() + " attempt to run.");
if (!processing) {
synchronized (this) {
try {
processing = true;
LOG.info(Thread.currentThread().getName() + " is processing");
List<String> xmlDocuments = documentManager.getFileNamesFromFolder(incomingFolderPath);
// loop over the files and processed unlock files.
for (String xmlDocument : xmlDocuments) {
processDocument(xmlDocument);
}
}
finally {
processing = false;
}
}
}
}
For the current code, I have to prevent other thread to process files when one thread is processing. Is that a good idea ? or we support multi-threaded processing. In that case how can I know which files is being process and which files has just arrived ? Any idea is really appreciated.
I would build it with these parts:
Castle Transactions with TxF
FileSystemWatcher JavaVersion
TransactionScope (no java version unless you hack it a lot)
A lock-free queue * (Paper discussing perf Java vs .Net, might be able to get source from them for Java) Java lock-based queues
Such that:
When there's a new file, the file system watcher detects it (remember to put the correct flags, handle the error condition and set Enbled <- True and watch out for doubles), puts the file path in the queue.
You have an application thread, n worker threads. If this is the only app, they spin-wait on the queue, TryDequeue, otherwise they block on a monitor while(!Monitor.Enter(has_items)) ;
When a worker threads get a path through the de-queue operation, it starts working on it, and now no other thread can work on it. If there are doubles of output (depending on your setup), you can then use a file transaction as you are writing the output file. If the Commit operation fails, then you know another thread has already written the output file, and resume polling the queue.
- Race condition, see: http://groups.google.com/group/lock-free/browse_thread/thread/c3b83466b27f6372
I'd do the following:
One thread that gets your filenames and adds them to a synchronized queue.
Multiple threads to do the actual reading: get an item from the synced queue and process it.
To check if a file is used you can simply try to rename/move it.
精彩评论