Using STL to search for raw bytes and replace it in a file, what is the best / correct way
I'd like to use the STL (C++0x in vs2010) to open a file and binary search through it and then replace when a match is found.
I can use fstream to open a file but i'm a bit confused to how to search the file.
Reading one byte at a time isn't good so it should read a block into a buffer and we then search in the buffer but is this functionality already in the STL ?
I want the buffering part to be automatic, when the search reach the end of buffer it should automatically read in the next block from the file as if it was reading it directly (using the same file offsets, etc..)
The other problem is when it finds a match exactly how should it update the file.
I know that this file functionality exists in windows using CreateFileMapping and MapViewOfFile but i want to use the STL to make my code portable and also by using the STL also more flexible. Using these windows function you can read the file without worrying about buffering, etc, they will even update the file when you change data. This is the functionality i'm after using the STL.
Updat开发者_开发技巧e: I've changed 'binary search' to 'raw byte search' to clear up the confusion, sorry for that.
An example of the raw byte search function ( if you have a better please do tell )
// NOTE: This function doesn't support big files > 4 gb
DWORD_PTR RawByteSearch( PBYTE pBaseAddress, DWORD dwLowSize, PBYTE pSignature, DWORD sigSize )
{
PBYTE pBasePtr = pBaseAddress;
PBYTE pEndPtr = pBaseAddress + dwLowSize;
DWORD_PTR i;
while ( pBasePtr < pEndPtr )
{
for (i = 0; i < sigSize; ++i)
{
if ( pSignature[i] != pBasePtr[i] )
break;
}
if ( i == sigSize ) // Found a match
{
return pBasePtr - pBaseAddress;
}
++pBasePtr; // Next byte
}
return 0;
}
std::search
already does what you need.
std::search(pBasePtr, bEndPtr, pSignature, pSignature+sigSize)
Out of completeness, you don't need to use memory mapping (although that will likely be the most efficient).
A 100% portable solution is to instead use the standard library stream iterators:
std::ifstream file;
std::search(std::istreambuf_iterator<char>(file.rdbuf()) // search from here
, std::istreambuf_iterator<char>() // ... to here
, pSignature // looking for a sequence starting with this
, pSignature+sigSize); // and ending with this
You should avoid the term "binary search". This is a term used for searching in sorted data in logarithmic time, what you are doing is a linear search (in binary data). The C++ standard library has a function for that, it's std::search. It would work something like:
PWORD pos = std::search(pBaseAddress, pBaseAddress+dwLowSize, pSignature, pSignature+pSigSize);
P.S. Hungarian Notation sucks ;)
std::fstream has a function called read that can read a block into a buffer. Am I misunderstanding something?
Edit: I just did a quick search on memory-mapped file implementations. Maybe this can be helpful for you: http://www.boost.org/doc/libs/1_40_0/libs/iostreams/doc/classes/mapped_file.html
BinarySearch actually means starting at the midway point and comparing to slice your search space in two, then going to the midway point of the new search space to divide it further. As your search space halves with every iteration, the operation is O(log N) which is therefore in general a lot faster.
You may wish to do that here by building up an index.
A lot of this depends how your file is formatted. Does it consist of fixed-length "records" with a key at the start of the record and with that key sorted?
If so, you can actually scan every Nth record to create an index in memory, then use binary search to locate where approximately in the file the record would be, then load just that section of the file into memory.
For example, let's say each record is 128 bytes, the key is 4 bytes and there are 32K records, so the file is about 4MB big. If you load every 64th record, you are reading 512 of them and storing 2K of data in memory. When you search your local index you then load an 8K portion into memory and search in that for the actual record you wish to change, and modify it appropriately.
If your records are not all the same length, you will have a problem if your modification tramples over the next record.
精彩评论