开发者

why printf works on non-terminated string

I am wondering how does printf() figure out when to stop printing a string, even I haven't put a termination character at the end of the string? I did an experiment by malloc a 10 bytes memory and put exactly 10 characters in it, somehow, the printf could开发者_开发知识库 still print out these characters without running out of bound, why?


There's a good chance that one of the characters after the string is NULL, so printf stops there, furthermore, characters that are not NULL after the memory that you malloced might not be printable characters so you won't notice them in the terminal.


Is because you had bad lucky and the next byte of your malloc'ed string was a 0 byte.

You can confirm that by doing:

const char* digits = "0123456789";
char* buff = (char*)malloc(10);
memcpy(buff, digits, 10);

printf("%s, %d\n", buff, (int)*(buff + 10));

Your program have to print:

0123456789 0

And that 0 is the NULL which you did't malloc'ed but it was there. Note that this behavior is UNDEFINED so you cannot trust in these things. As I said before this happened because you are unlucky! The good thing to happen in this situation is a SIGSEGV.


It isn't just luck that unterminated strings tend not to cause problems on small programs.

On most OSs/processors malloc rounds up allocations to multiples of 4 or 8 bytes (dependent on the memory alignment requirements of the processor) so there are often (but not always) a few spare bytes at the end of the string.

Typically when malloc requires more memory it is allocated one or more virtual pages (typically 4k) by the OS. For security reasons the pages have to be wiped if they were last used by a different process (or have not been used since warm reset?).

Therefore, because there are lots of zeros about (both in the allocated area and just following) there is a good chance that non-terminated strings will not cause a problem at startup or in small, short running programs (which ironically includes most test programs) but will show up later on when malloc reuses freed blocks.

To guard against this class of problem, development and test builds should use something like efence with the EF_FILL option to set the malloc'd memory to a non-zero value.

Similarly it is a useful idea to initialise the stack to non-zero values as - on most machines with VM - the stack is built from 4k pages that are wiped before being allocated to a process.

Note that even using things like efence there is still a problem with static variables - the whole area is wiped to zero as the program is loaded (and again data is aligned) so an unterminated string will probably go unnoticed if a static string variable is written to only once - the problem will only be noticed if a string variable is re-used to store a shorter unterminated string.

On a related issue, the alignment of variables explains why not allocating enough room for the terminating NUL of a string often goes undetected.


Assuming you really did malloc 10 characters, and you really did set each and every character with a value other than null ('\0'), then where is the guarantee that the character you didn't allocate which immediately followed in memory wasn't null by chance?

You may have used one of a number of function calls that are smart enough to set the last character as null, even if you passed it enough information to possibly "set" is to be non-null, but with so few details, we will never know.


The random garbage that is after the last byte in the string was null. It was luck. It could fail the next time you run the program or work 100 times in a row. Welcome to pointer errors (and they can be difficult to debug too).


Well, well well, keep aside the whole MALLOC thing cause for PRINTF its all just a string right, i know the %d, %x, %s and all we use as format specifiers but the thing is printf if a mere "C" function which can intake variable number of arguments.

In simpler words printf is a special function which treats the string as a variable number of CHAR type arguments passed to it.

Any argument of \n,\t etc or %c,%f etc is a single character for it and is worked upon as special case.

void myprintf(char * frmt,...)
{

char *p;
int i;
unsigned u;
char *s;
va_list argp;


va_start(argp, fmt);

p=fmt;
for(p=fmt; *p!='\0';p++)
{
if(*p=='%')
{
putchar(*p);continue;
}

p++;

switch(*p)
{
case 'c' : i=va_arg(argp,int);putchar(i);break;
case 'd' : i=va_arg(argp,int);
if(i<0){i=-i;putchar('-');}puts(convert(i,10));break;
case 'o': i=va_arg(argp,unsigned int); puts(convert(i,8));break;
case 's': s=va_arg(argp,char *); puts(s); break;
case 'u': u=va_arg(argp,argp, unsigned int); puts(convert(u,10));break;
case 'x': u=va_arg(argp,argp, unsigned int); puts(convert(u,16));break;
case '%': putchar('%');break;
}
}

va_end(argp);
}

char *convert(unsigned int, int)
{
static char buf[33];
char *ptr;

ptr=&buf[sizeof(buff)-1];
*ptr='\0';
do
{
*--ptr="0123456789abcdef"[num%base];
num/=base;
}while(num!=0);
return(ptr);
} 

Hope this helps, if it doesn't just let me know, I'd be glad to be of any help to you :)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜