开发者

Padding string left in C troubles

I wanted to create a string padding function for the use of left-padding a binary representation with zeros, padding to the defined byte size. I first tried printf but that did not allow zero padding on a string and was not flexible.

I had come up with the following function:

char * strpadleft(char * string, char pad, size_t bytes) {
 size_t ssize = strlen(string);
 size_t bits = bytes * 8;                            
 char *padded = (char *) malloc(bits + 1); /* Bit size + null terminator */
 memset(padded, pad, bits);                /* Fill contents with zeros, leave last null terminator*/
 padded -= ssize + 1;                      /* Rewind bac开发者_Python百科k to offset*/
 strncpy(padded, string, ssize);           /* Replace for example bits 16->32 with representation*/
 return padded;
}

/*Example: strpadleft("0100100001", '0', 4); */

Now unfortunately this returns simply the unpadded string (ex. 0100100001). Is my pointer arithmetic wrong, am I copying to the wrong location, or is there something else I missed that does not let this work?


Here is a break down of what is happening using your example call, strpadleft("0100100001", '0', 4);

ssize is set to 10

bits is set to 32

padded points to an allocated memory area of 33-bytes.

Here is a simple/crude representation of the allocated memory:

   ..............................................
ma 0000000000000000000000000000000000000000000000
ed xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
md 0000000000000000111111111111111222222222222222
yr 0123456789ABCDEF123456789ABCDEF123456789ABCDEF
              |                               |
              |                               padded allocation end
              padded allocation start

Given that example padded holds the address of 0x0B.

memset then sets all bytes of allocated memory to your pad character '0'.

   ...........00000000000000000000000000000000...
ma 0000000000000000000000000000000000000000000000
ed xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
md 0000000000000000111111111111111222222222222222
yr 0123456789ABCDEF123456789ABCDEF123456789ABCDEF
              |                               |
              |                               "padded" allocation end
              "padded" allocatoin start

Then you subtract 11 from the padded pointer, padded now holds the address of 0x00.

(This is the error in your logic, you wanted to increment the pointer not decrement. You also, as others have pointed out, do not want to modify padded to do this. Use a temp variable instead, or be sure to readjust padded after you do the string copy.)

   0100100001n00000000000000000000000000000000... (Note: 'n' represents the null character here)
ma 0000000000000000000000000000000000000000000000
ed xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
md 0000000000000000111111111111111222222222222222
yr 0123456789ABCDEF123456789ABCDEF123456789ABCDEF
   |          |                               |
   |          |                               "padded" allocation end
   |          "padded" allocatoin start
   "padded" now points here 

Then you are returning the address padded points to which as you can see points to the beginning of the original string.

On top of this not doing what you want, you also have corruppted memory by writing outside the bounds of the memory allocation.

I'd recommend using a debugger of your choice and trying to step through your code watching the key variables to solve these kinds of problems.


There is major misconception and some other problems:

  • memset() does not change padded

That is, the variable in your function is not changed; memset() just sets the data that padded points to.

The purported 'reset' operation padded -= ssize + 1 therefore invokes undefined behaviour by accessing memory that you did not allocate.

Use:

strcpy(padded + bits - ssize, string);

instead of the two lines:

padded -= ssize + 1;
strncpy(padded, string, ssize);

Using strcpy() is safe because you know all the sizes.

Note that malloc() does not return initialized data, you cannot guarantee that the last allocated byte will be zero. You would have to use calloc() for that.

Note that the memset() operation does NOT null terminate your string.

Note that using strncpy(), paradoxically, also does not guarantee null termination and indeed does not null terminate your string even if you got the start position correct. By contrast, using strcpy() does guarantee null termination.

Working code

Note the revised interface - using const char * for the first argument. (The static just gets the code to compile under my default compilation flags without a complaint of no prior declaration of the function. You would not use that for a library function declared in a header, of course.)

#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

static char *strpadleft(const char * string, char pad, size_t bytes)
{
    size_t ssize = strlen(string);
    size_t bits = bytes * 8;
    char *padded = (char *) malloc(bits + 1);
    assert(ssize < bits);
    memset(padded, pad, bits - ssize);
    strcpy(padded + bits - ssize, string);
    return padded;
}

int main(void)
{
    const char *data = "0100100001";
    char *pad = strpadleft(data, '0', 4);
    printf("Data: <<%s>> padded <<%s>> (%d)\n", data, pad, (int)strlen(pad));
    free(pad);
    return(0);
}

Commentary

You really need to decide what would be appropriate behaviour if ssize > bits (hint: assert() is not correct). Most probably, though, you would simply duplicate the original string. Note: it would absolutely NOT be acceptable to return a pointer to the original string. The function returns a pointer to a string that must be freed by the application; you must therefore always return an allocated string. Otherwise, your function becomes unusable; the code has to check whether the return value is the same as the argument and not release the return value if it is the same. Yuck!

Quasi-fixed code

Demonstrating the lack of null termination in the original code:

static char * strpadleft(const char * string, char pad, size_t bytes)
{
    size_t ssize = strlen(string);
    size_t bits = bytes * 8;
    char *padded = (char *) malloc(bits + 1);
    padded[bits] = 'X';  // Overwrite last allocated byte
    memset(padded, pad, bits);
    strncpy(padded + bits - ssize, string, ssize);
    return padded;
}

With the same test program as before, and relying on undefined behaviour (there was no guarantee that the byte after the X would be a null byte), I got:

Data: <<0100100001>> padded <<00000000000000000000000100100001X>> (33)

Note that the 'X' was not overwritten by the strncpy()! You could fix that with ssize + 1, but why not just use strcpy()...as already stated...


Change the line:

padded -= ssize + 1;
strncpy(padded, string, ssize);           /* Replace for example bits 16->32 with representation*/

to

char *data = padded + (bits - ssize);
strncpy(data , string, ssize);           /* Replace for example bits 16->32 with representation*/
padded [bits] = '\0';

Do not change padded because you are to return that value, that the reason because the variable data is created, and as phihag said memset do no change the address of padded.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜