Trace changes to variables automatically
I am debugging a C program (GCC and GDB in Linux and Visual Studio in Windows) that gives different results on two different architectures. I'd like to compare execution on each architecture by tracing the changes to the values stored in variables in order to locate differences.
file main.c, line 234. 开发者_如何学JAVAVariable A changes from 34 to 23 file main.c, line 236. Variable B changes from 0 to 2 ..... etc.
Can the compiler be instructed to instrument to this effect, without manually littering the source with printf
statements?
I would write a script to autopilot the debugger. I don't know if expect is available on windows (it probably is), but it's a great tool for writing scripts that autopilot interactive tools. The script would go something like:
#!/usr/bin/expect
set progname [lindex $argv 0]
spawn gdb $progname
expect "(gdb)"
send "break main\n"
expect "(gdb)"
send "run\n"
while 1 {
expect {
"(gdb)" {
send "info locals\n"
expect "(gdb)"
send "next\n"
}
"Program exited normally" {
exit;
}
}
}
I would change "main" to the function where you think the program goes wrong. You can also insert any other debugger commands you want, such as printing out the line you are on before printing the variables; here i use "info locals" that prints out all local values. Clearly you would need to save this output to a file for analysis. Expect is pretty easy to learn and the syntax is all based on tcl.
There are several factors to consider in an implicit solution:
- C is barely type aware. For example if you used @Jens Gustedt's suggestion of instrumenting functions, you still have the problem of determining what exactly you're looking at on the stack, and how to appropriately print out the values. IIRC, instrumenting functions won't give you the prototype of the upcoming function, nor will it provide a handy struct with pointers to those values. This would be more akin to C++ templating. You'd have to write an enter/exit function that was aware of the prototypes of all the functions, and have inside knowledge of the precise calling convention and packing arrangements for your variables.
- You need to compile with debugging symbols, in order to identify what memory was associated with a variable defined in your code, and when those variables are being modified.
- More complex types do not have a standard way to be printed (structs for example), even in C++. You'd need to have printing functions that conform to some kind of interface.
- Pointers and arrays of indeterminate length will be very difficult to handle. You'd need to test for NULL pointers, and know array sizes in advance to correctly print their values.
I've attempted something similar to your "printf at function entry" in an existing project I'm working on. It's kind of useful for important functions, but I've gotten far more go out of asserting values are in the expected ranges at function entry and exit. Here's the logging header and source, which demonstrate how to greatly simplify the printing of lines, functions and other useful information around your manual tracing. You'll notice many of my types have a corresponding *_valid()
function, and assertions surrounding their use. As soon as my program runs off the tracks (which I assure you is quite frequent during debugging), the asserts will fire and I can analyse the situation via the stack trace.
You may also find this answer useful regarding both the difficulty of doing this stuff with C alone, and how to work around it using macros.
Your best bet if you need this done implicitly is via GDB, presumably you can write scripts to analyse changes after each instruction, and intelligently determine and print types using the debugging info provided by -g
.
Valgrind can trace all stores automatically, although the standard tools don't easily provide an explicit trace. You would have to write your own valgrind tool, e.g. by modifying cachegrind, to trace all Ist_Store instructions.
Alas, valgrind doesn't work on Windows.
Both gdb and Visual Studio's debuggers support watch-ing variables. This is like a break point, but rather than a place within code your program breaks when the value of a variable is changed. You may find that they do not behave exactly the same if the value is changed to exactly the same value it already has (not sure if there is any difference, but there could be).
Another place where you may find differences in the behavior of the two debuggers is if you are watching local variables. One debugger may forget any watch variables in functions that return, while another may rewatch them every time, while another may just end up watching the actual memory location with out caring what that memory has been re-purposed as. I'm not sure how these debuggers work in this respect since I'm not in the habit of watching local variables (I'm not sure I've ever watched local variables).
What you will probably want to do is set a break point around where you initialize these variables, step through the initialization, and then set a watch on the variables and let the program run. You can record the changes (either manually or {at least with gdb} script something up) and see when the two programs diverge in behavior.
You will probably find that either the size of some type is different on one than the other, you are using uninitialized memory (and uninitialized memory comes set differently on the two systems), or that some data structure padding may be changing something. If it is more complicated then that, is not something like an API function erroring out and you not checking it, or some warning that one or both compilers are giving you that you are ignoring then it is likely to be very difficult to find.
You should make sure that you turn off all optimizations, unless the problem only presents itself with optimizations turned on.
Something that will probably help you out a lot would be to try to unit test the pieces of the program on each machine. It is much easier to spot and understand differences in behavior for small problems than big problems.
The obvious question is to ask is if your program (and all the libraries it calls) are really architecture independent, or even deterministic (you might have a non-stable sort, for example, or even threads somewhere), or if you simply have an uninitialized variable somewhere.
Assuming that you believe your program to be entirely deterministic, and you really want a data trace on each assignment, a way to to accomplish this is with a program transformation system. Such a tool accepts "if you see this, replace it by that" patterns using the surface syntax of your target language (in this case, C). You instrument your program using transformation rules to automatically find all the places where instrumentation is needed and insert it there, compile and run the instrumented program, and then throw the instrumented version away having gotten your trace.
Our DMS Software Reengineering Toolkit could do this. What you want to do is to replace every assignment with a combination of assignment and a printf. A DMS rewrite rule to accomplish this would be:
rule insert_tracing_printfs(l: lhs, e: expression): expression -> expression =
" \lhs=\e " -> "print_and_assign_int(\tostring\(\lhs\),&\lhs,\e)" if int_type(lhs);
The basic rule format is ifyouseethis -> replacebythis if somecondition. The quote marks are really metaquotes, and enable C-syntax to be embedded inside the rule langauge. The \ is an escape from C syntax back into the rule langauge.
These rules operates on abstract syntax trees generated by DMS as a consequence of parsing the (preprocessed) C source code. This particular rule accurately pattern matches the source code for exact syntax lhs=*e* for all legal forms of lhs and e. When it finds such a match, it replaces the assignment by a function call that happens to do the assignment and also prints out your trace value.
The \tostring function takes the lhs tree and generates source text corresponding to the original expression; this is easily accomplished with a DMS API for prettyprinting ASTs. The int_type function interrogates the DMS-generated symbol table to determine if the type of the lhs is int.
You'd need one rule like for each datatype you wanted to print out. You also need a rule for each kind of assignment syntax (e.g, "=", "+=", "%="...) your program uses. So, basic data types and a handful of assignment syntax types suggests you need 1-2 dozen rules like this.
You also need corresponding C functions to match the various data types:
int print_and_assign_int(char* s, int* t, int v)
{ printf("Integer variable %s changes from %d to %d\n",s,*t,v);
*t=v;
return v;
}
(If you want file and line numbers, too, you can just add them as extra arguments to the print function using the C preprocessor macros for file and line numbers.)
For a C statement like:
if (x=getc())
{ y="abc";
p=&y;
}
a set of rewrite rules done this way would automatically produce something like:
if (print_and_assign_char("x", &x,getc()))
{ print_and_assign_charstar("y",&y,"abc");
print_and_assign_ptrtocharstar("p",&p,&y);
}
You'd have to decide how you want to print out assigned pointer values, because you must assume they do not have equivalent addresses, so you essentially have to print the value selected by the pointer. That gets you into trouble whenever you have void*; but you can print out what you know about the void* variable, e.g., is is NULL or not, and that would still be useful trace data.
This might all be worth the trouble if you did this kind of debugging a lot. IMHO, you're probably better off to just bite the bullet and debug your way to a solution as I expect you will get suprised by some architecture dependence.
If you want cross platform, printf is your friend. A little work with grep will find where the variables are being assigned to. Aside from that I'd think your effort would be best spent figuring out how to shorten your edit/compile*2/run_tests*2/diff cycle and putting some thought into where to do the binary splits.
I while back, I had more or less the same problem to solve but with the added complexity of having an incomplete language translator in the middle. Being able to run both version and diff the output quickly made it a very reasonable problem.
You can tell gdb to break whenever some instruction modifies your variable. IIRC the command is 'watch'. E.g. 'watch A', or 'watch *(int*)0x123456'
.
And you can even tell it to break when someone reads it, with 'rwatch'.
You can tell gcc to instrument function calls: -finstrument-functions
. This doesn't get you on the granularity of assignment, but close if you pack some elementary functionalities into inline
functions.
I think the program ctrace might be what you're after; it was available in AT&T Unix (back in the days when AT&T owned Unix), but the URL is to the Sun manual page. You can probably find it, therefore, on the other proprietary versions of Unix (AIX, HP-UX, SCO); it is not clear that there is a version for Linux.
The CTrace library at SourceForge is not the same thing at all.
If you know the lines where the variables change that you are interested in, then you can use the breakpoint command to do a simple tracing:
Example:
#include <iostream>
int main(int, char **)
{
for(int i = 0; i < 100; ++i)
{
std::cout << i << std::endl;
}
return 0;
}
When you compile this program like
c++ -g -o t1 t1.cpp
then you can use a breakpoint command like this:
break 7
commands
print i
continue
end
to generate a simple trace. This should also work for watchpoints (breakpoints that get triggered when a variable changes state).
Here's the log of an example gdb session:
$ gdb t1
GNU gdb (GDB) 7.1-ubuntu
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /tmp/t1...done.
(gdb) break 7
Breakpoint 1 at 0x40087c: file t1.cpp, line 7.
(gdb) commands
Type commands for when breakpoint 1 is hit, one per line.
End with a line saying just "end".
>print i
>continue
>end
(gdb) set pagination off
(gdb) r
Starting program: /tmp/t1
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$1 = 0
0
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$2 = 1
1
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$3 = 2
2
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$4 = 3
3
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$5 = 4
4
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$6 = 5
5
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$7 = 6
6
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$8 = 7
7
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$9 = 8
8
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$10 = 9
9
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$11 = 10
10
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$12 = 11
11
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$13 = 12
12
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$14 = 13
13
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$15 = 14
14
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$16 = 15
15
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$17 = 16
16
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$18 = 17
17
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$19 = 18
18
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$20 = 19
19
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$21 = 20
20
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$22 = 21
21
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$23 = 22
22
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$24 = 23
23
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$25 = 24
24
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$26 = 25
25
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$27 = 26
26
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$28 = 27
27
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$29 = 28
28
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$30 = 29
29
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$31 = 30
30
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$32 = 31
31
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$33 = 32
32
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$34 = 33
33
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$35 = 34
34
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$36 = 35
35
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$37 = 36
36
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$38 = 37
37
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$39 = 38
38
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$40 = 39
39
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$41 = 40
40
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$42 = 41
41
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$43 = 42
42
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$44 = 43
43
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$45 = 44
44
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$46 = 45
45
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$47 = 46
46
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$48 = 47
47
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$49 = 48
48
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$50 = 49
49
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$51 = 50
50
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$52 = 51
51
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$53 = 52
52
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$54 = 53
53
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$55 = 54
54
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$56 = 55
55
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$57 = 56
56
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$58 = 57
57
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$59 = 58
58
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$60 = 59
59
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$61 = 60
60
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$62 = 61
61
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$63 = 62
62
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$64 = 63
63
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$65 = 64
64
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$66 = 65
65
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$67 = 66
66
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$68 = 67
67
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$69 = 68
68
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$70 = 69
69
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$71 = 70
70
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$72 = 71
71
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$73 = 72
72
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$74 = 73
73
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$75 = 74
74
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$76 = 75
75
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$77 = 76
76
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$78 = 77
77
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$79 = 78
78
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$80 = 79
79
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$81 = 80
80
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$82 = 81
81
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$83 = 82
82
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$84 = 83
83
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$85 = 84
84
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$86 = 85
85
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$87 = 86
86
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$88 = 87
87
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$89 = 88
88
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$90 = 89
89
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$91 = 90
90
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$92 = 91
91
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$93 = 92
92
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$94 = 93
93
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$95 = 94
94
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$96 = 95
95
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$97 = 96
96
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$98 = 97
97
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$99 = 98
98
Breakpoint 1, main () at t1.cpp:7
7 std::cout << i << std::endl;
$100 = 99
99
Program exited normally.
(gdb) q
First, I assume you actually use the same code in both cases. Otherwise, start looking where the code differs.
If your program gives different result on different architectures, then there's a chance that you're not being dealing with some things the right way. First thing I'd do is to turn on all possible compiler warnings and pay attention to them.
If that yields nothing, I'd try a static code analysis tool. I've used coverity (commerical product), but there are also others. These tools will sometimes help you with finding errors in your programs that the compilers don't spot.
When those automated options were exhausted I'd might try comparing all variables in the whole program. Depending on your program size, that could be really time-consuming.
精彩评论