Why won't my value increment? Memory stomping/stack glitch?
I've come across a rather charming bug that I'm trying to wrap my head around.
I feel like I've 开发者_如何转开发seen this before, but this time I want to understand why this is happening.
I have:
int i;
int debug = 0;
for(i = 0; i < buf_end; i++) {
do_some_buffer_work();
if(something_is_true()) {
do_something_important();
printf("debug is %i, i is %i", debug++, i);
}
}
printf("end\n");
and I get as output:
debug is 1, i is 55
debug is 2, i is 55
so there was one point where the loop was executed twice with the same value for i. nothing I do within the loop directly touches i. Furthermore I doubt this is traditional memory stomping as the value is always the same. I suspected that it was something causing the program counter to move around (since sometimes with bad linking you get similar bugs,) until I did the following test:
int i;
static int debug;
for(i = 0; i < buf_end; i++) {
do_some_buffer_work();
if(something_is_true()) {
do_something_important();
printf("debug is %i, i is %i\n", debug++, i++);
printf("debug is %i, i is %i\n", debug++, i++);
printf("debug is %i, i is %i\n", debug++, i++);
}
}
printf("end\n");
and I got this interesting output:
debug is 0, i is 55
debug is 1, i is 56
debug is 2, i is 57
debug is 3, i is 55
debug is 4, i is 56
debug is 5, i is 57
end
So here it is obvious that two complete iterations with the same i have executed, but the debug variable was not affected. It would appear that the value is being cached and restored for some reason. I had a hunch and changed the debug variable to non-static and got this:
int i;
int debug = 0;
for(i = 0; i < buf_end; i++) {
do_some_buffer_work();
if(something_is_true()) {
do_something_important();
printf("debug is %i, i is %i\n", debug++, i++);
printf("debug is %i, i is %i\n", debug++, i++);
printf("debug is %i, i is %i\n", debug++, i++);
}
}
printf("end\n");
and I got this interesting output:
debug is 0, i is 55
debug is 1, i is 56
debug is 2, i is 57
debug is 0, i is 55
debug is 1, i is 56
debug is 2, i is 57
end
So it looks like the variables that live on the stack are reset to the beginning of the 55th iteration.
I am sure that the bug is in one of these do_something_important() calls - which deal with buffer reading - but this bug has taken on a character of its own and I think I owe it some respect to find out more about its nature before I squash it. So please rather than trying to help me fix it let me know if you have some clue as to why it is happening. More specifically, what can be changed in the program state to 'reset' values like this?
Edit: I'm sorry if people are annoyed that I left out the functions. They are quite large and have references to other functions, but the main reason I left it out was because I don't care about fixing this problem; I want to know how I might recreate it in the simplest way possible.
2nd Edit: Here's the immediate function where the blues happen. Including the referenced function and all subfunctions and definitions is probably around 500 lines so I'm not doing that here.
static int find_headers_search(FCALParseContext *fpc, uint8_t *buf, int buf_size,
int search_start)
{
FCALFrameInfo fi;
int end_offset = -1, size = 0, i;
uint8_t *header_buf;
int debug = 0;
for (i = 0; i < buf_size - 1; i++) {
if ((AV_RB16(buf + i) & 0xFFFE) == 0xFFF8) {
av_log(NULL,AV_LOG_DEBUG,"predebug%i i %i\n",debug, i);
header_buf = fcal_fifo_read_wrap(fpc, search_start + i,
MAX_FRAME_HEADER_SIZE,
&fpc->wrap_buf,
&fpc->wrap_buf_allocated_size);
if (frame_header_is_valid(header_buf, &fi)) {
av_log(NULL,AV_LOG_DEBUG,"frame num %u bufstart %u, size %u, end %u i %i\n", (unsigned int)fi.frame_or_sample_num,
search_start, buf_size, search_start + buf_size -1, i);
FCALHeaderMarker **end_handle = &fpc->headers;
size = 0;
while (*end_handle) {
end_offset = (*end_handle)->offset;
end_handle = &(*end_handle)->next;
size++;
}
*end_handle = av_mallocz(sizeof(FCALHeaderMarker));
if (!*end_handle) {
av_log(fpc->avctx, AV_LOG_ERROR,
"couldn't allocate FCALHeaderMarker\n");
return AVERROR(ENOMEM);
}
(*end_handle)->fi = fi;
(*end_handle)->offset = search_start + i;
/* The actual size of the linked list is now size + 1 */
update_sequences(fpc, size - FCAL_MAX_SEQUENTIAL_HEADERS,
FFMIN(size, FCAL_MAX_SEQUENTIAL_HEADERS),
*end_handle);
fpc->nb_headers_found++;
size++;
av_log(NULL,AV_LOG_DEBUG,"debug%i i %i\n",debug++, i++);
size = 0;
while (*end_handle) {
end_offset = (*end_handle)->offset;
end_handle = &(*end_handle)->next;
size++;
}
*end_handle = av_mallocz(sizeof(FCALHeaderMarker));
if (!*end_handle) {
av_log(fpc->avctx, AV_LOG_ERROR,
"couldn't allocate FCALHeaderMarker\n");
return AVERROR(ENOMEM);
}
(*end_handle)->fi = fi;
(*end_handle)->offset = search_start + i;
/* The actual size of the linked list is now size + 1 */
update_sequences(fpc, size - FCAL_MAX_SEQUENTIAL_HEADERS,
FFMIN(size, FCAL_MAX_SEQUENTIAL_HEADERS),
*end_handle);
fpc->nb_headers_found++;
size++;
av_log(NULL,AV_LOG_DEBUG,"debug%i i %i\n",debug++, i++);
av_log(NULL,AV_LOG_DEBUG,"debug%i i %i\n",debug++, i++);
av_log(NULL,AV_LOG_DEBUG,"debug%i i %i\n",debug++, i++);
}
}
}
return size;
}
There's a buffer, there's a buf_end, there's some 'buffer work'. All invisible in the snippet. Clearly there's some invisible code that writes past the end of the buffer, stomping the local variables debug and i. Set a data breakpoint on the buffer end and you'll find it in a minute or two, usually.
It looks like recursion/reentrancy to me. The following hypothesis would explain the observed behaviour:
do_some_buffer_work()
under some condition calls the function containing this code. This would explain the static being consistent and the locals restarting. The reason you only see it for certain values of i
is because something_is_true()
is only true for those values of i
.
You should probably step the code in your debugger. Observing the call stack at a break-point on your printf() will quickly determine whether recursion has occurred.
This may or not be helpful (not knowing a thing about what do_some_buffer_work()
, something_is_true()
, or do_something_important()
do kind hampers analysis a bit). but may try having 2 loop counter variables in different places and check for when they differ:
int i;
static int ii;
int debug = 0;
for(i = 0, ii = 0; i < buf_end; i++, ii++) {
if (i != ii) debugBreak();
do_some_buffer_work();
if (i != ii) debugBreak();
if(something_is_true()) {
if (i != ii) debugBreak();
do_something_important();
printf("debug is %i, i is %i", debug++, i);
}
if (i != ii) debugBreak();
}
This might give you closer context to when things go bad.
The problem was due to re-evaluating a side effecting function with a MAX(a,b) macro. It was in the client code to this function.
精彩评论