Reproducing "Gallery of processor cache effects"
Having read this article I tried to reproduce the example on my Mac. However my curve for the first example looks completely different and I don't understand why..
My code is below:
#include <mach/mach_time.h>
#include <time.h>
#include <stdio.h>
#include <stdlib.h>
void mach_absolute_difference(uint64_t end, uint64_t start, struct timespec *tp) {
uint64_t difference = end - start;
static mach_timebase_info_data_t in开发者_运维知识库fo = {0,0};
if (info.denom == 0)
mach_timebase_info(&info);
uint64_t elapsednano = difference * (info.numer / info.denom);
tp->tv_sec = elapsednano * 1e-9;
tp->tv_nsec = elapsednano - (tp->tv_sec * 1e9);
}
int main(void)
{
int len = 64 * 1024 * 1024;
int *arr = (int *)malloc(sizeof(int)*len);
uint64_t start,end;
struct timespec tp;
start = mach_absolute_time();
for (int i = 0; i <len; i += K)
arr[i] = 0;
end = mach_absolute_time();
mach_absolute_difference(end, start, &tp);
FILE *fp;
fp=fopen("simple_array.log", "a+");
fprintf(fp, "%i\t%ld\t%ld\n", K, tp.tv_sec,
tp.tv_nsec);
fclose(fp);
free(arr);
return 0;
}
I measured the time as described in this blog, hoping that it's correct. I'm also wondering what I should use to either measure execution time or CPU cycles on a Mac. Or even nicer it would be see the amount of cache hit/misses for a certain function. Shark however only shows l2 cache misses in percentages.
Update this is it when compiled for 32 bit, note that the int size changes from 8 bytes to 4 bytes
A few things:
- Your work loop has no observable side effects; are you sure the compiler isn't optimising some or all of it away?
- An
int
is probably not one byte in size - The bigger you make
K
, the less overall work you're doing (i
reacheslen
in fewer iterations)
精彩评论