How to search in a BYTE array for a pattern?
I have开发者_运维百科 a Byte array :
BYTE Buffer[20000];
this array contains the following data:
00FFFFFFFFFFFF0010AC4C4053433442341401030A2F1E78EEEE95A3544C99260F5054A54B00714F8180B3000101010101010101010121399030621A274068B03600DA281100001C000000FF003457314D44304353423443530A000000FC0044454C4C2050323231300A2020000000FD00384B1E5310000A20202020202000FA
My question is how can I search this array for a pattern like "000000FC
"? I don't really think it is important, but I need the index where I can find my pattern too.
Since you're in C++, do it the C++ way:
char a[] = { 0, 0, 0, 0xFC };
char Buffer[20000] = ...
std::string needle(a, a + 4);
std::string haystack(Buffer, Buffer + 20000); // or "+ sizeof Buffer"
std::size_t n = haystack.find(needle);
if (n == std::string::npos)
{
// not found
}
else
{
// position is n
}
You can also use an algorithm to search the array directly:
#include <algorithm>
#include <iterator>
auto it = std::search(
std::begin(Buffer), std::end(Buffer),
std::begin(a), std::end(a));
if (it == std::end(Buffer))
{
// not found
}
else
{
// subrange found at std::distance(std::begin(Buffer), it)
}
Or, in C++17, you can use a string view:
std::string_view sv(std::begin(Buffer), std::end(Buffer));
if (std::size_t n = sv.find(needle); n != sv.npos)
{
// found at position n
}
else
{
// not found
}
You want something like memmem
(that code is licensed with the GPL).
However, it should not be difficult to roll your own. Like in memmem
's implementation, you need a loop that uses memchr
to find the first character of your needle in the haystack, and memcmp
to test each hit and see if all of your needle is there.
It's possible to use raw pointers with std::search()
.
For example:
#include <algorithm>
BYTE Buffer[20000] = { 0x00, 0xFF, 0xFF, 0x00, 0x00, 0x00, 0xFC };
PBYTE pBufferLast = Buffer + sizeof(Buffer);
BYTE Pattern[] = { 0x00, 0x00, 0x00, 0xFC };
PBYTE pPatternLast = Pattern + sizeof(Pattern);
PBYTE pOccurrence = std::search(Buffer, pBufferLast, Pattern, pPatternLast);
BOOL fFound = (pOccurrence != pBufferLast);
Since C++17
, std::search()
can use Boyer-Moore search (boyer_moore_searcher), etc.
Try this, just needed it:
// Returns a pointer to the first byte of needle inside haystack,
static uint8_t* bytes_find(uint8_t* haystack, size_t haystackLen, uint8_t* needle, size_t needleLen) {
if (needleLen > haystackLen) {
return false;
}
uint8_t* match = memchr(haystack, needle[0], haystackLen);
if (match != NULL) {
size_t remaining = haystackLen - ((uint8_t*)match - haystack);
if (needleLen <= remaining) {
if (memcmp(match, needle, needleLen) == 0) {
return match;
}
}
}
return NULL;
}
Here's a simple/naive solution using C buffers:
const char *find_needle(const char *haystack, size_t haystack_length, const char *needle, size_t needle_length) {
for (size_t haystack_index = 0; haystack_index < haystack_length; haystack_index++) {
bool needle_found = true;
for (size_t needle_index = 0; needle_index < needle_length; needle_index++) {
const auto haystack_character = haystack[haystack_index + needle_index];
const auto needle_character = needle[needle_index];
if (haystack_character == needle_character) {
continue;
} else {
needle_found = false;
break;
}
}
if (needle_found) {
return &haystack[haystack_index];
}
}
return nullptr;
}
A more efficient solution would be using the Knuth-Morris-Pratt
algorithm for example but the implementation is also more complex.
EDIT:
Using C++17
the fastest solution I found was using std::search
and the boyer_moore_horspool_searcher.
'Tarion' his posted function was almost perfect. But it did a 'memcmp' to much (rechecking the first char again that it already did with 'memchr'. And it didnt work if more of the first characters would be in the array followed by not always the needle bytes afterwards. So here you see 2 posts, just copy it into your C++ project and call the function TestIt()
You can modify it to your own wishes. Also know that a char* is just a char[] array but as a pointer. So if you see a '&' in front of a char variable its just to use the char* as a regular char[] by index by returning the pointer to that index. Also to redim arrays that arent fixed sizes, use the 'malloc' C++ function. And do never EVER listen to C++ programmers who point you to use C String functions to search for binary data. They suck big time. Stick with C++ functions only.
How to use.: Search Bytes Routine for C++ (PART 1/2 and PART 2/2) for your find bytes function and copy it into your C++ project, you have to copy it all (also the header lines in the white area above it, and below for PART 2/2 where you see the '}' char in white area), i couldnt upload it all to well because of StackOverflow post errors and im in a hurry.
Search Bytes Routine for C++ (PART 1/2) - copy code here below into your C++ project
// Returns a int with the starting position of the bytes found
int FindBytesPosInCharPointer(char* bytSourceBuffer, size_t lngSourceBufferStartPos, size_t lngSourceBufferTotalLen, char* bytBytesToFind, size_t lngBytesToFindLen) {
//when calling this function bytSourceBuffer must always point to index [0]
//use the lngSourceBufferStartPos to use a starting point index
if (lngBytesToFindLen <= 0 || lngBytesToFindLen > lngSourceBufferTotalLen - lngSourceBufferStartPos) {
return -1;
}
if (lngSourceBufferStartPos < 0) {
return -1;
}
if (lngSourceBufferStartPos >= lngSourceBufferTotalLen) {
return -1;
}
if (lngSourceBufferTotalLen <= 0) {
return -1;
}
//memchr returns a pointer to the array
size_t lngFoundPos = 0;
size_t lngFirstPosFound = 0;
size_t lngCurStartPos = 0;
char* lngPointerPos;
int intMustExitLoop = 0;
//this can change underway but used only internal
lngCurStartPos = lngSourceBufferStartPos;
//find first byte
lngPointerPos = NULL;
lngPointerPos = (char*)memchr(&bytSourceBuffer[lngCurStartPos], bytBytesToFind[0], lngSourceBufferTotalLen - lngCurStartPos);
//validate
if (lngPointerPos == NULL) {
//karakter niet gevonden
lngFoundPos = -1;
lngFirstPosFound = -1;
}
else {
//current position where char has been found
lngFoundPos = (lngPointerPos - &bytSourceBuffer[lngCurStartPos]);
lngFirstPosFound = lngFoundPos; //we need this to skip bytes
//zoeken
if (lngBytesToFindLen == 1) {
//we only want to find 1 char, so we dont need to use memcmp
//is done now
}
else {
//check if (remaining) bytBytesToFind bytes matching
if (lngBytesToFindLen <= ((lngSourceBufferTotalLen - lngCurStartPos) - lngFoundPos)) { //(lngSourceBufferLen - lngCurStartPos) = remaining bytes
//memcmp = 0 is blocks zijn hetzelfde
if (memcmp(lngPointerPos + 1, &bytBytesToFind[1], lngBytesToFindLen - 1) == 0) {
//BYTES MATCHING, lngFoundPos will return the result
}
else {
//MessageBox(NULL, "BLOCKS DO NOT MATCH", "Info", 0);
lngFoundPos = -1;
}
}
else {
//MessageBox(NULL, "NOT ENOUGH BYTES", "Info", 0);
lngFoundPos = -1;
}
//validate the result, if bytes did not match we start the loop
if (lngFoundPos == -1) {
//new position to start searching from
lngCurStartPos = lngFirstPosFound + 1;
lngFirstPosFound = -1;
//validate if startpos is still valid
if (lngCurStartPos < 0 || (lngSourceBufferStartPos + lngCurStartPos) >= lngSourceBufferTotalLen) {
lngFoundPos = -1;
}
else {
if (lngBytesToFindLen > lngSourceBufferTotalLen - (lngSourceBufferStartPos + lngCurStartPos)) {
lngFoundPos = -1;
}
else {
//condition
intMustExitLoop = 0;
//start loop
do {
//validate if startpos is still valid
if (lngCurStartPos < 0 || (lngSourceBufferStartPos + lngCurStartPos) >= lngSourceBufferTotalLen) {
lngFoundPos = -1;
intMustExitLoop = 1;
break;
}
else {
if (lngBytesToFindLen > lngSourceBufferTotalLen - (lngSourceBufferStartPos + lngCurStartPos)) {
lngFoundPos = -1;
intMustExitLoop = 1;
break;
}
else {
//search for first byte again, memchr returns a pointer
lngPointerPos = NULL;
lngPointerPos = (char*)memchr(&bytSourceBuffer[lngCurStartPos], bytBytesToFind[0], lngSourceBufferTotalLen - lngCurStartPos);
//afhandelen
if (lngPointerPos == NULL) {
//character not found
lngFoundPos = -1;
intMustExitLoop = 1;
break;
}
else {
//this is the current position where character has been found
lngFoundPos = (lngPointerPos - &bytSourceBuffer[lngCurStartPos]);
lngFirstPosFound = lngFoundPos; //we need this to skip bytes
//MessageBoxA(NULL, std::to_string(lngFirstPosFound).c_str(), "Caption", 0);
//check if (remaining) bytBytesToFind bytes matching
if (lngBytesToFindLen <= ((lngSourceBufferTotalLen - (lngSourceBufferStartPos + lngCurStartPos)) - lngFoundPos)) { //(lngSourceBufferLen - lngCurPos) = remaining bytes
//memcmp = 0 is blocks zijn hetzelfde
if (memcmp(lngPointerPos + 1, &bytBytesToFind[1], lngBytesToFindLen - 1) == 0) {
//BYTES MATCHING
//This is the real position in the array from starting point lngSourceBufferStartPos
lngFoundPos = (lngPointerPos - &bytSourceBuffer[lngCurStartPos]) + (lngCurStartPos - lngSourceBufferStartPos);
//is done now
intMustExitLoop = 1;
break;
}
else {
//MessageBoxA(NULL, "BLOCKS DO NOT MATCH", "Caption", 0);
//will loop again for next search
lngFoundPos = -1;
lngCurStartPos = lngCurStartPos + (lngFirstPosFound + 1);
lngFirstPosFound = -1;
}
}
else {
//MessageBoxA(NULL, "NOT ENOUGH BYTES", "Caption", 0);
//is meteen klaar dan
lngFoundPos = -1;
lngFirstPosFound = -1;
intMustExitLoop = 1;
break;
}
}
}
}
} while (intMustExitLoop == 0);
}
}
}
}
}
//-1 is nothing found
return lngFoundPos;
}
Search Bytes Routine for C++ (PART 2/2) - copy code here below into your C++ project (put these lines below in a TestIt function, or whatever you prefer). And then call the function. I had problems with adding function name to StackOverflow, but the first line should have been "int TestIt(){" and the last 2 lines should have been "return 1;" and "}"
//test
char* bytTest;
bytTest = (char*)malloc(10);
bytTest[0] = 0;
bytTest[1] = 0;
bytTest[2] = 70;
bytTest[3] = 70;
bytTest[4] = 65; //A
bytTest[5] = 66; //B
bytTest[6] = 1;
bytTest[7] = 70;
bytTest[8] = 68;
bytTest[9] = 69;
//bytTest[10] = 0;
//bytes to find
char bytFindBytes[3];
bytFindBytes[0] = 70;
bytFindBytes[1] = 68;
bytFindBytes[2] = 69;
int lngFindPos = 0;
lngFindPos = FindBytesPosInCharPointer(&bytTest[0], 0, 10, &bytFindBytes[0], sizeof bytFindBytes);
MessageBoxA(NULL, std::to_string(lngFindPos).c_str(), "Caption", 0);
//i know i can just pass bytTest and bytFindBytes without the '&' and [0] to the function, but it is to show you how to use pointers, using bytTest[0] only would pass the value of the first byte in the array to the function (which would give a compile error anyway), where adding '&' passes the pointer of the first byte in the array to the function. Just to make this clear for you.
精彩评论