Using STL to search for raw bytes and replace it in a file, what is the best / correct way

2023-02-05 10:41 问答作者：

I'd like to use the STL (C++0x in vs2010) to open a file and binary search through it and then replace when a match is found.

I can use fstream to open a file but i'm a bit confused to how to search the file.

Reading one byte at a time isn't good so it should read a block into a buffer and we then search in the buffer but is this functionality already in the STL ?

I want the buffering part to be automatic, when the search reach the end of buffer it should automatically read in the next block from the file as if it was reading it directly (using the same file offsets, etc..)

The other problem is when it finds a match exactly how should it update the file.

I know that this file functionality exists in windows using CreateFileMapping and MapViewOfFile but i want to use the STL to make my code portable and also by using the STL also more flexible. Using these windows function you can read the file without worrying about buffering, etc, they will even update the file when you change data. This is the functionality i'm after using the STL.

Updat开发者_开发技巧e: I've changed 'binary search' to 'raw byte search' to clear up the confusion, sorry for that.

An example of the raw byte search function ( if you have a better please do tell )

// NOTE: This function doesn't support big files > 4 gb
DWORD_PTR RawByteSearch( PBYTE pBaseAddress, DWORD dwLowSize, PBYTE pSignature, DWORD sigSize )
{
 PBYTE pBasePtr = pBaseAddress;
 PBYTE pEndPtr = pBaseAddress + dwLowSize;
 DWORD_PTR i;
 while ( pBasePtr < pEndPtr )
 {
  for (i = 0; i < sigSize; ++i)
  {
   if ( pSignature[i] != pBasePtr[i] )
    break;
  }
  if ( i == sigSize ) // Found a match
  {
   return pBasePtr - pBaseAddress;
  }
  ++pBasePtr; // Next byte
 }
 return 0;
}

std::search already does what you need.

std::search(pBasePtr, bEndPtr, pSignature, pSignature+sigSize)

Out of completeness, you don't need to use memory mapping (although that will likely be the most efficient).

A 100% portable solution is to instead use the standard library stream iterators:

std::ifstream file;
std::search(std::istreambuf_iterator<char>(file.rdbuf()) // search from here
          , std::istreambuf_iterator<char>()             // ... to here
          , pSignature                                   // looking for a sequence starting with this
          , pSignature+sigSize);                         // and ending with this

You should avoid the term "binary search". This is a term used for searching in sorted data in logarithmic time, what you are doing is a linear search (in binary data). The C++ standard library has a function for that, it's std::search. It would work something like:

PWORD pos = std::search(pBaseAddress, pBaseAddress+dwLowSize, pSignature, pSignature+pSigSize);

P.S. Hungarian Notation sucks ;)

std::fstream has a function called read that can read a block into a buffer. Am I misunderstanding something?

Edit: I just did a quick search on memory-mapped file implementations. Maybe this can be helpful for you: http://www.boost.org/doc/libs/1_40_0/libs/iostreams/doc/classes/mapped_file.html

BinarySearch actually means starting at the midway point and comparing to slice your search space in two, then going to the midway point of the new search space to divide it further. As your search space halves with every iteration, the operation is O(log N) which is therefore in general a lot faster.

You may wish to do that here by building up an index.

A lot of this depends how your file is formatted. Does it consist of fixed-length "records" with a key at the start of the record and with that key sorted?

If so, you can actually scan every Nth record to create an index in memory, then use binary search to locate where approximately in the file the record would be, then load just that section of the file into memory.

For example, let's say each record is 128 bytes, the key is 4 bytes and there are 32K records, so the file is about 4MB big. If you load every 64th record, you are reading 512 of them and storing 2K of data in memory. When you search your local index you then load an 8K portion into memory and search in that for the actual record you wish to change, and modify it appropriately.

If your records are not all the same length, you will have a problem if your modification tramples over the next record.

继续阅读：c++11 stl

Using STL to search for raw bytes and replace it in a file, what is the best / correct way

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？