Custom Prefetch
Any programmatic techniques, portable or specific to NT
and Linux
that get the result of number of large files loading faster? I am 开发者_运维知识库after a 'ahead of time', a prior, whatever you prefer to call it mechanisms that I can control in code for two OS in a question.
Each file has to be processed in full, i.e. completely in size and sequentially for its contents. The aim is to speed up some batch file processing.
I don't know about NT, but one option on Linux would be to use madvise
with the MADV_WILLNEED
flag shortly before you actually need the next file to start reading it in early.
Alternately, a more portable option would be to simply manually do readahead in a separate thread from your buffer-processing thread - that is, read data in to fill an X MB buffer in thread A, process it as fast as you can in thread B.
I am not aware of a Win32 (NT) API similar to madvise()
.
However, I would suggest an approach.
First, pass the Win32 flag FILE_FLAG_SEQUENTIAL_SCAN
to CreateFile()
. This will allow the Windows operating system to perform better buffering of the file once you have opened it.
With FILE_FLAG_SEQUENTIAL_SCAN
, your file parser may operate more quickly once the file is in memory. Unlike madvise()
on Linux, the file will not begin loading into memory any earlier due to the use of the Win32 flag.
Next, we need to trigger the file to begin loading. Asynchronously read the first page of the file by calling ReadFileEx()
with an OVERLAPPED
structure and a FileIOCompletionRoutine
function.
Your FileIOCompletionRoutine
can simply return, or you can set the event in the overlapped structure -- read the MSDN details of ReadFileEx
for details.
Since it would not be a critical failure if the pre-fetch hasn't completed when you actually read from the file, the easiest implementation would be to "fire and forget" -- execute the overlapped file read and then never check the result of it. Be sure that you read the data into valid buffers, though!
If you perform this operation for a file while reading the previous file, the result should be that the next file will commence paging in.
Be aware that this may slow your performance. As the next file begins to page in, the disk I/O to access that file will compete with disk I/O for the file you are currently parsing. If the two files are physically distant from each other on the same disk, the result of pre-fetching might be additional delay as the drive head seeks. Although modern drives have huge buffers which mitigate this, queuing the first page of a new file is likely to cause a head seek.
bdonlan's suggestion of a 'pre-fetch' thread which loads the files asynchronously from the processing would be a workable solution for Win32, also.
精彩评论