How to check whether all bytes in a memory block are zero
I have a block of memory with elements of fixed size, say 100 bytes, put into it one after another, all with the same fixed length, so memory looks like this
<element1(100 bytes)><element2(100 bytes)><element3(100 bytes)>...
In some situat开发者_开发知识库ions I need to determine whether all bytes of a certain element are set to the 0-byte because that has a special meaning (I didn't say it was a good idea, but that is the situation I am in).
The question is, how do I do that efficiently. Further: is there a simple function to do it. For setting bytes to zero I can used memset or bzero, but I don't know of any function for checking for zero.
At the moment I am using a loop for the check
char *elementStart = memoryBlock + elementNr*fixedElementSize;
bool special = true;
for ( size_t curByteNr=0; curByteNr<fixedElementSize; ++curByteNr )
{
special &= (*(elementStart+curByteNr)) == 0;
}
Of course, I could loop with a bigger offset and check several bytes at once with a mword or some other suited bigger type. And I guess that would be rather efficient, but I would like to know whether there is a function to take that burden from me.
Suggested functions:
- !memcmp (compareBlock, myBlock, fixedElementSize)
You could perhaps actually use memcmp without having to allocate a zero-valued array, like this:
static int memvcmp(void *memory, unsigned char val, unsigned int size)
{
unsigned char *mm = (unsigned char*)memory;
return (*mm == val) && memcmp(mm, mm + 1, size - 1) == 0;
}
The standard for memcmp does not say anything about overlapping memory regions.
The obvious portable, high efficiency method is:
char testblock [fixedElementSize];
memset (testblock, 0, sizeof testblock);
if (!memcmp (testblock, memoryBlock + elementNr*fixedElementSize, fixedElementSize)
// block is all zero
else // a byte is non-zero
The library function memcmp()
in most implementations will use the largest, most efficient unit size it can for the majority of comparisons.
For more efficiency, don't set testblock
at runtime:
static const char testblock [100];
By definition, static variables are automatically initialized to zero unless there is an initializer.
I can't believe no one posted this yet... a solution that actually looks like C++ and isn't UB for breaking aliasing rules:
#include <algorithm> // std::all_of
#include <cstddef> // std::size_t
// You might only need this
bool
memory_is_all_zeroes(unsigned char const* const begin,
std::size_t const bytes)
{
return std::all_of( begin, begin + bytes,
[](unsigned char const byte) { return byte == 0; } );
}
// but here's this as a bonus
template<typename T_Element, std::size_t T_count>
bool
array_is_all_zeroes( T_Element const (& array)[T_count] )
{
auto const begin = reinterpret_cast<unsigned char const*>(array);
auto const bytes = T_count * sizeof(T_Element);
return memory_is_all_zeroes(begin, bytes);
}
int
main()
{
int const blah[1000]{0};
return !array_is_all_zeroes(blah);
}
This might not satisfy some people's assumptions about efficiency (which are just that, assumptions, until profiled), but I think being valid and idiomatic code are much in its favour.
AFAIK there is no automatically function to check memory.
You could use | to speed up the for-loop, no need for "=="
char *elementStart = memoryBlock + elementNr*fixedElementSize;
char special = 0;
for ( size_t curByteNr=0; curByteNr<fixedElementSize; ++curByteNr )
{
special |= (*(elementStart+curByteNr));
}
and also can you use long for even more speed
char *elementStart = memoryBlock + elementNr*fixedElementSize;
long special = 0;
for ( size_t curByteNr=0; curByteNr<fixedElementSize; curByteNr += sizeof(long) )
{
special |= *(long*)(elementStart+curByteNr);
}
WARNING: the above code is not tested. Please test it first so that the sizeof and casting operator works
I have tested some solutions proposed here and checked memcmp
source code which is not optimized for the OP needs since it has an additional requirement to perform sorting, leading it to compare unsigned char
one by one.
In the following, I propose an optimized function check_memory_zeroed
which performs most of the check on the biggest aligned int available, making it portable, and I compare it with the other solutions proposed in this thread. Time measurement is performed and results printed.
It shows that the proposed solution is near twice better than wallyk's obvious portable high efficiency method and does not need to create an additional array, and six times better than char by char comparison or mihaif's shifted array which saves RAM compared to wallyk's one.
I have also tested my solution without aligning the words check_memory_zeroed_bigestint_not_aligned
and surprisingly, it performs even better. If someone has an explanation, he is welcome.
Here is the code with functional and performance tests on a 1Gb table (the proposed optimized function is the fisrt one : check_memory_zeroed):
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <inttypes.h>
#include <assert.h>
#include <time.h>
#define BIG_TAB_SIZE 1000000000
typedef intmax_t biggestint;
int check_memory_zeroed (void* ptr, size_t size)
{
if (ptr == NULL) return -1;
int bis = sizeof(biggestint);
char* pc = (char*) ptr;
biggestint* pbi0 = (biggestint*) pc;
if ((size_t) pc % bis) /* is aligned ? */
pbi0 = (biggestint*) (pc + (bis - ((size_t) pc % bis))); /* minimal pointer larger than ptr but aligned */
assert ((size_t) pbi0 % bis == 0); /* check that pbi0 is aligned */
for (char* p = pc; p < (char*) pbi0; p++)
if(*p) return 0; /* check beginning of non aligned array */
biggestint* pbi = pbi0;
biggestint* pbiUpper = ((biggestint*) (pc + size)) - 1;
for (;pbi <= pbiUpper; pbi++)
if(*pbi) return 0; /* check with the biggest int available most of the array : its aligned part */
for (char* p = (char*) pbi; p < pc + size; p++)
if(*p) return 0; /* check end of non aligned array */
return 1;
}
int check_memory_zeroed_bigestint_not_aligned (void* ptr, size_t size)
{
if (ptr == NULL) return -1;
biggestint* pbi = (biggestint*) ptr;
biggestint* pbiUpper = ((biggestint*) (((char*) ptr) + size)) - 1;
for (;pbi <= pbiUpper; pbi++)
if(*pbi) return 0; /* check with the biggest int available most of the array, but without aligning it */
for (char* p = (char*) pbi; p < ((char*) ptr) + size; p++)
if(*p) return 0; /* check end of non aligned array */
return 1;
}
int check_memory_zeroed_by_char (void* ptr, size_t size)
{
if (ptr == NULL) return -1;
for (char* p = (char*) ptr; p < ((char*) ptr) + size; p++)
if(*p) return 0;
return 1;
}
/* variant of wallyk solution */
int check_memory_zeroed_by_memcmp_and_testblock (void* ptr, size_t size)
{
void* testblock = malloc(size);
if (ptr == NULL || testblock == NULL) return -1;
memset (testblock, 0, sizeof(testblock));
int res = ! memcmp (testblock, ptr, size);
free (testblock);
return res;
}
/* variant of mihaif solution */
int check_memory_zeroed_by_memcmp_with_shifted_array (void* ptr, size_t size)
{
if (ptr == NULL) return -1;
char* pc = (char*) ptr;
return (*pc) || memcmp(pc, pc + 1, size - 1);
}
int test() {
/* check_memory_zeroed (void* ptr, size_t size) */
char tab[16];
for (int i = 0; i < 8; i++)
for (int j = 0; j < 8; j++) {
for (int k = 0; k < 16; k++) tab[k] = (k >= i && k < 16 - j) ? 0 : 100 + k;
assert(check_memory_zeroed(tab + i, 16 - j - i));
if (i > 0) assert(tab[i-1] == 100 + i - 1);
if (j > 0) assert(tab[16 - j] == 100 + 16 - j);
for (int k = i; k < 16 - j; k++) {
tab[k] = 200+k;
assert(check_memory_zeroed(tab + i, 16 - j - i) == 0);
tab[k] = 0;
}
}
char* bigtab = malloc(BIG_TAB_SIZE);
clock_t t = clock();
printf ("Comparison of different solutions execution time for checking an array has all its values null\n");
assert(check_memory_zeroed(bigtab, BIG_TAB_SIZE) != -1);
t = clock() - t;
printf ("check_memory_zeroed optimized : %f seconds\n",((float)t)/CLOCKS_PER_SEC);
assert(check_memory_zeroed_bigestint_not_aligned(bigtab, BIG_TAB_SIZE) != -1);
t = clock() - t;
printf ("check_memory_zeroed_bigestint_not_aligned : %f seconds\n",((float)t)/CLOCKS_PER_SEC);
assert(check_memory_zeroed_by_char(bigtab, BIG_TAB_SIZE) != -1);
t = clock() - t;
printf ("check_memory_zeroed_by_char : %f seconds\n",((float)t)/CLOCKS_PER_SEC);
assert(check_memory_zeroed_by_memcmp_and_testblock(bigtab, BIG_TAB_SIZE) != -1);
t = clock() - t;
printf ("check_memory_zeroed_by_memcmp_and_testblock by wallyk : %f seconds\n",((float)t)/CLOCKS_PER_SEC);
assert(check_memory_zeroed_by_memcmp_with_shifted_array(bigtab, BIG_TAB_SIZE) != -1);
t = clock() - t;
printf ("check_memory_zeroed_by_memcmp_with_shifted_array by mihaif : %f seconds\n",((float)t)/CLOCKS_PER_SEC);
free (bigtab);
return 0;
}
int main(void) {
printf("Size of intmax_t = %lu\n", sizeof(intmax_t));
test();
return 0;
}
And the results for comparison of different solutions execution time for checking an array has all its values null:
- Size of intmax_t = 8
- check_memory_zeroed optimized : 0.331238 seconds
- check_memory_zeroed_bigestint_not_aligned : 0.260504 seconds
- check_memory_zeroed_by_char : 1.958392 seconds
- check_memory_zeroed_by_memcmp_and_testblock by wallyk : 0.503189 seconds
- check_memory_zeroed_by_memcmp_with_shifted_array by mihaif : 2.012257 seconds
It is not possible to check all 100 bytes at the same time. So, you (or any utility functions) have to iterate through the data in any case. But, besides having a step size bigger than 1 byte, you could do some more optimizations: For example, you could break
as soon as you find a non-zero value. Well, the time complexity would still be O(n), I know.
I can't recall a standard library function which could do this for you. If you are not sure this causes any performance issues I'd just use the loop, maybe replace char* with int* as already suggested.
If you do have to optimize you could unroll the loop:
bool allZeroes(char* buffer)
{
int* p = (int*)buffer; // you better make sure your block starts on int boundary
int acc = *p;
acc |= *++p;
acc |= *++p;
...
acc |= *++p; // as many times as needed
return acc == 0;
}
You may need to add special handling for the end of buffer if it's size is not a multiple of sizeof(int), but it could be more efficient to allocate a slightly larger block with some padding bytes set to 0.
If your blocks are large you could treat them as a sequence of smaller blocks and loop over them, using the code above for each small block.
I would be curious to know how this solution compares with std::upper_bound(begin,end,0)
and memcmp
.
EDIT
Did a quick check how a home-grown implementation compares with memcmp, used VS2010 for that.
In short:
1) in debug mode home-grown can be twice as fast as memcmp
2) in release with full optimization memcmp has an edge on the blocks which start with non-0s. As the length of the zero-filled preamble increases it starts losing, then somehow magically gets almost as fast as homegrown, about only 10% slower.
So depending on your data patterns and need/desire to optimize you could get some extra performance from rolling out your own method, but memcmp is a rather reasonable solution.
Will put the code and results on github in case you could use them.
The following will iterate through the memory of a structure. Only disadvantage is that it does a bytewise check.
#include <iostream>
struct Data { int i; bool b; };
template<typename T>
bool IsAllZero(T const& data)
{
auto pStart = reinterpret_cast<const char*>(&data);
for (auto pData = pStart; pData < pStart+sizeof(T); ++pData)
{
if (*pData)
return false;
}
return true;
}
int main()
{
Data data1;// = {0}; // will most probably have some content
Data data2 = {0}; // all zeroes
std::cout << "data1: " << IsAllZero(data1) << "\ndata2: " << IsEmptyStruct(data2);
return 0;
};
What about using long int
and binary or
operator.
unsigned long long int *start, *current, *end, value = 0;
// set start,end
for(current = start; current!=end; current++) {
value |= *current;
}
bool AllZeros = !value;
Well if you just want to decide whether a single element is all 0s you can create a 100byte element with all 1s
. Now when you want to check whether an element is all 0s just binary AND (&)
the content of the element and the element you created(all 1s). now if the result of binary AND is zero
the element you checked had all 0s otherwise it was not all 0s
the creation of a 100 byte element with all 1s seems costly but if you have a large number of elements to check then its actually better
you can create the 100 byte element with all 1s as void *elem; elem=malloc(100);
now set all bits to 1(use ~(elem&0)
)
精彩评论