C File With No #?
Suppose that you are give a single C source file
, that contains a max. of 300 lines of code.
Suppose also that the file, while implementing several functions, DOES NOT contain the character '#' in it (meaning, there are NO #include
statmements, and no other statements that have '#' in the file).
My question is, does the above guarantee that the file does not do any I/O? does it guarantee that the file will not be able to (say) erase the co开发者_开发知识库ntents of the hard drive, or do other fishy things?
(I am supposed to get 100-200 single C
files, that (as mentioned) do not include the char #
in them. I was asked to write a simple program that will programmatically check if a single C source file
with no #
is potentially involved in I/O, accessing to the network etc).
Given the fact that no statements with #
are allowed -- what is the WORST code a coder can include in such a C
file to potentially damage the system of the one who runs it?
I know that no check will yield 100% accuracy -- but am interested at least to do some basic checks that will raise a red flag if some expressions / keywords are found. Any ideas of what to look for?
No, it can't guarantee that. You can produce the code where all includes and macros are expanded, and you can make it into a single huge file, then compile it... that file, won't contain any preprocessor directive, though it can do anything usually C can do on a system.
If the original coder were to include inline assembly, they could do pretty much anything they liked, without importing any libraries.
One could just copy and paste the definitions of standard file types and functions (e.g. FILE, fopen(), fprintf(), flocse()) etc into a C file. In this way no include is needed and when the file is compiled and linked to the proper libraries, it will be able to perform I/O.
#
is not the only token that can start a preprocessor directive. ??=
and %:
are equivalent definitions in the standard. (But they are not recognized by all compilers.)
C allows unsafe operations with pointers. For example on a system without ASLR it's trivial to get the pointer to arbitrary library functions. It's not very robust since any memory access violation will kill you, but at least if you know the target system it's possible.
ASLR makes it slightly more difficult, but I assume you could just get a pointer to the current position on the stack and then crawl upward until you reach stack belonging to the entry point of your thread. Which will have some interesting pointers for sure.
Absence of preprocessor directives doesn't guarantee anything except the absence of preprocessor directives.
You could still manually add the data types and function prototypes for any library functions you're interested in. If you're familiar with the underlying platform, you could bypass the standard library entirely and make system calls directly.
Once upon a time I saw code (probably for the IOCCC) that used an array of unsigned char to store raw opcodes and then used type punning to treat it as a function, something like
unsigned char instr[] = {0x00, 0x12, 0x33, ...};
void (*foo)(void) = (void (*)(void)) instr;
foo();
Note that this relied on undefined behavior and a host of non-portable assumptions, and I'm not even sure such an approach would work anymore. But if it did, this isn't something that would be easy to catch with a simple source scan.
EDIT
I found the code I was thinking of - it was an IOCCC entry from 1984. It doesn't work the way I described, though. Hey, I'm getting old, and stuff isn't sticking to my brain the way it used to.
short main[] = {
277, 04735, -4129, 25, 0, 477, 1019, 0xbef, 0, 12800,
-113, 21119, 0x52d7, -1006, -7151, 0, 0x4bc, 020004,
14880, 10541, 2056, 04010, 4548, 3044, -6716, 0x9,
4407, 6, 5568, 1, -30460, 0, 0x9, 5570, 512, -30419,
0x7e82, 0760, 6, 0, 4, 02400, 15, 0, 4, 1280, 4, 0,
4, 0, 0, 0, 0x8, 0, 4, 0, ',', 0, 12, 0, 4, 0, '#',
0, 020, 0, 4, 0, 30, 0, 026, 0, 0x6176, 120, 25712,
'p', 072163, 'r', 29303, 29801, 'e'
};
Here's the explanation:
The Grand Prize: Sjoerd Mullender & Robbert van Renesse Without question, this C program is the most obfuscated C program that has ever been received! Like all great contest entries, they result in a change of rules for the following year. To prevent a flood of similar programs, we requested that programs be non machine specific. This program was selected for the 1987 t-shirt collection. NOTE: If your machine is not a Vax-11 or pdp-11, this program will not execute correctly. In later years, machine dependent code was discouraged. The C startup routine (via crt0.o) transfers control to a location named main. In this case, main just happens to be in the data area. The array of shorts, which has been further obfuscated by use of different data types, just happens to form a meaningful set of PDP-11 and Vax instructions. The first word is a PDP-11 branch instruction that branches to the rest of the PDP code. On the Vax main is called with the calls instruction which uses the first word of the subroutine as a mask of registers to be saved. So on the Vax the first word can be anything. The real Vax code starts with the second word. This small program makes direct calls to the write() Unix system call to produce a message on the screen. Can you guess what is printed? We knew you couldn't! :-) Copyright (c) 1984, Landon Curt Noll. All Rights Reserved. Permission for personal, educational or non-profit use is granted provided this this copyright and notice are included in its entirety and remains unaltered. All other uses must receive prior permission in writing from both Landon Curt Noll and Larry Bassel.
Again, I don't know if this trick would work on any modern desktop OS, but it would be fun to find out.
not necessicarily. Most compilers generate warnings for implicit declarations, but link in the functions anyway. You can generate a list of io-performing functions, and see if they're called, but that still doesn't preclude inline asm from invoking io-related system calls.
You should probably run with low privlages in a sandbox, and look at what syscalls they make with something like strace.
The following program is a valid C program that produces output on stdout
. It contains no #
characters:
int puts(const char *s);
int main(void)
{
puts("hi");
return 0;
}
It doesn't even produce a warning from the compiler (/Wall /W3
on MSVC and -Wall -Wextra
on MinGW), much less an error.
You can also try compiling the C files into a static binary, disassemble it and check for system call (sysenter, int) instructions. IO cannot be done from userspace and a process will need to go to the kernel to do any kind of IO.
However, this still doesn't protect against execution of instructions in non-text portions of your binary. In the worst case, you may have instructions being fabricated in runtime and executed. For that, I think the best bet is to do code coverage while tracing the process for system calls. Linux has strace which can help with that.
精彩评论