Should I read file in separate thread in this case?
I am writing an application for embedded linux where 5% of processor time is going in reading a file and 95% on processing it. Can I get some performance improvement if I read file in one thread and keeps on processing in another thread?
I am reading from mmc card which has DMA support. Filesize is of 20mb and it is devided in chunks of 2 kb. I will queue chunks from reader thread and process it in processor thread. So thread sync is nee开发者_如何学Cded while inserting and deleting from queue only.
I am programming for ARM9.
What should be fast single threaded / multi threaded.
I recommend not using another thread. Instead use posix_fadvise() to tell Linux to read more of your file in advance. The kernel can be reading the file via DMA while your program is processing data.
This assumes that the kernel has enough free memory for data buffering. If your data processing is using all of the memory then the kernel will ignore posix_fadvise().
The exact call that you need would look something like this:
while( 1 ) {
ret = read(fd, buffer, 2*1024);
if( ret < 0 ) abort();
if( ret == 0 ) break;
if( ret != 2*1024 ) abort();
pos += ret;
ret = posix_fadvise(fd, pos, 8*1024, POSIX_FADV_WILLNEED);
if( ret ) abort();
process(buffer);
}
The only way to know for sure is to try it. But it sounds as if you need your processor to read chunks of the file as it is needed by the processor. Since you're processor bound, the most improvement you could expect is the 5% time it takes to read.
Two threads would require an in-memory buffer to hold the next chunk of file so that it's immediately available for processing, and many embedded systems are extremely limited in available memory.
Right now, when you make the call to read, your program blocks while the data is read. Then it starts up again when it's done and I presume your processing code takes over. The time when it's blocked won't show up as "cpu time" via "time" because the process is in a sleep state during this period. (This depends on DMA being available which it is).
You will probably show a wall-clock increase over the whole program of the time it takes to read in that file, but your cpu time will not go down (and will probably go up due to synchronization).
There are a couple of things you will want to make sure of.
Can both activities be done in parallel? If the hardware/architecture is going to cause the processing thread to block the other thread then there will be no gain.
The maximum gain you can expect is 5%, (based on Amdhal's law). is the complexity in coding worth that?
I would recommend looking at more efficient ways of processing the file. Look closely at what the processing thread is doing and see.
You would probably get some improvement from being able to process data while the read going, but there will necessarily be some overhead as well. As with any optimization problem measurement is the key.
The real question is whether it's worth implementing something in order to measure the difference. For a 5% maximum gain, I suspect the answer is no, but it's up to you how much the potential for some of that 5% is worth versus your time.
Does your platform support memory-mapped files? That would allow you to leave the reading up to the O/S, which it probably does pretty well.
If you read the data sequentially the additional thread probably is not worth it, because the kernel will read the file ahead and cache contents in memory. Memory mapping the file, unless you are writing for an embedded system (one where MMC is memory-mapped), changes little (the file has to be loaded in memory sometime and these loads will just be trigerred by attempted reads and not by explicit call).
I wrote an article about
Multithreaded File Access
on ddj.com. It probably answers a part of your question.
精彩评论