开发者

Using C, convert a dynamically-allocated int array to a comma-separated string as cleanly as possible

I’m much less experienced in C than I am in higher-level languages. At Cisco, we use C, and I sometimes run into something that would be easy to do in Java or Python, but very difficult to do in C. Now is one of those times.

I have a dynamically-allocated array of unsigned integers which I need to convert to a comma-separated string for logging. While the integers are not likely to be very large, the开发者_如何转开发y could conceptually be anywhere from 0 to 4,294,967,295 In Python, that’s one short line.

my_str = ','.join(my_list)

How elegantly can people do this in C? I came up with a way, but it’s gross. If anyone knows a nice way to do it, please enlighten me.


Code now tested and builds under gcc.

In contrast to other answers, does not mandate C99.

The real problem here is not knowing the length of the string you're going to need. Getting a number is as easy as sprintf("%u", *num) using num to walk your array of ints, but how much space are you going to need? To avoid overrunning a buffer, you have to keep track of a lot of integers.

size_t join_integers(const unsigned int *num, size_t num_len, char *buf, size_t buf_len) {
    size_t i;
    unsigned int written = 0;

    for(i = 0; i < num_len; i++) {
        written += snprintf(buf + written, buf_len - written, (i != 0 ? ", %u" : "%u"),
            *(num + i));
        if(written == buf_len)
            break;
    }

    return written;
}

Notice, that I keep track of how much of the buffer I have used and use snprintf so I don't overrun the end. snprintf will tack on a \0, but since I'm using buf + written I will start at the \0 of the previous snprintf.

In use:

int main() {
    size_t foo;
    char buf[512];

    unsigned int numbers[] = { 10, 20, 30, 40, 1024 };

    foo = join_integers(numbers, 5, buf, 512);
    printf("returned %u\n", foo);
    printf("numbers: %s\n", buf);
}

Outputs:

returned 20
numbers: 10, 20, 30, 40, 1024

Forcing the limiting to kick in, instead of overrunning:

char buf[15];    
foo = join_integers(numbers, 5, buf, 14);
buf[14] = '\0';

Outputs, expectedly:

returned 14
numbers: 10, 20, 30, 4


You can actually use a library like Glib, which contains functions like

gchar* g_strjoin (const gchar *separator, ...);

Joins a number of strings together to form one long string, with the optional separator inserted between each of them. The returned string should be freed with g_free().

(you still need to use g_snprintf(), possibly with g_printf_string_upper_bound() to ensure space)


Are you guys getting paid by the line? :-)


f() is declared with a char * parameter for prototyping purposes, just change char -> int. I interpreted the question as requiring a string as output rather than just code to write to a file.

#define PRINT(s, l, x, i) snprintf((s), (l), "%s%d", (i) ? ",":"", (x)[i]);

char *f(size_t len, char *x) {
  size_t  i, j = 0, k;

  for(i = 0; i < len; ++i)
      j += PRINT(NULL, 0, x, i);
  char *result = malloc(k = ++j);
  for(*result = i = j = 0; i < len; ++i)
      j += PRINT(result + j, k - j, x, i);
  return result;
}

Here is a test framework:

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

// put f() here

int main(int ac, char **av) {
    for(int i = 1; i < ac; ++i) { 
        printf("%s\n", f(strlen(av[i]), av[i]));
    }
    return 0;
}


What about this?

char *join_int_list(const unsigned int *list, size_t n_items)
{
     enum { SIZEOF_INT_AS_STR = sizeof("4294967295,")-1 };
     char *space = malloc(SIZEOF_INT_AS_STR * n_items);
     if (space != 0)
     {
         size_t i;
         char *pad = "";
         char *dst = space;
         char *end = space + SIZEOF_INT_AS_STR * n_items;
         for (i = 0; i < n_items; i++)
         {
              snprintf(dst, end - dst, "%s%u", pad, list[i]);
              pad = ",";
              dst += strlen(dst);
         }
         space = realloc(space, dst - space + 1);
     }
     return(space);
}

It is the responsibility of the caller to release the returned pointer - and to check that it is not null before using it. The 'realloc()' releases extra space if the amount allocated was enough too large to make it worth while. This code is blithely assuming that the values are indeed 32-bit unsigned integers; if they can be bigger, then the enum needs appropriate adjustment.

Tested code:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

char *join_int_list(const unsigned int *list, size_t n_items)
{
    enum { SIZEOF_INT_AS_STR = sizeof("4294967295,")-1 };
    char *space = malloc(SIZEOF_INT_AS_STR * n_items);
    if (space != 0)
    {
        size_t i;
        char *pad = "";
        char *dst = space;
        char *end = space + SIZEOF_INT_AS_STR * n_items;
        for (i = 0; i < n_items; i++)
        {
            snprintf(dst, end - dst, "%s%u", pad, list[i]);
            pad = ",";
            dst += strlen(dst);
        }
        space = realloc(space, dst - space + 1);
    }
    return(space);
}

int main(void)
{
    static unsigned int array[]= { 1, 2, 3, 49, 4294967295U, 0, 332233 };
    char *str = join_int_list(array, sizeof(array)/sizeof(array[0]));
    printf("join: %s\n", str);
    free(str);
    return(0);
}

Checked with valgrind - seems to be OK.


Discussion of converting INT_MAX or UINT_MAX to string:

You can use sizeof("," STRINGIZE(INT_MAX)) instead of hard coding it. The stringize macro is a common cpp tool that can be defined as #define STRINGIZE_(v) #v and #define STRINGIZE(v) STRINGIZE_(v). – R. Pate

@R Pate: good idea - yes, you could do that quite effectively. Actually, there are two interesting ideas in there: the use of string concatenation with sizeof() (parentheses required for clarity - but string concatenation happens early enough that the compiler isn't worried) and the use of a stringize operation on INT_MAX. – Jonathan Leffler

Using a stringize operation on INT_MAX is not a good idea - it just has to be a "constant expression", not necessarily a sequence of digits. It could be defined as ((1<<32)-1), or even something whacky like __int_max, as long as the compiler lets you use it anywhere you can use a constant expression. – caf

And @caf is right. Consider this code:

#include <limits.h>
#include <stdio.h>

#undef INT_MAX
#define INT_MAX (INT_MIN-1 - 100 + 100)

#define QUOTER(x)   #x
#define STRINGIZER(x)   QUOTER(x)

enum { SIZEOF_INT_AS_STR = sizeof("4294967295,")-1 };
enum { SIZEOF_INT_AS_STR_1 = sizeof(STRINGIZER(INT_MAX) ",")-1 };

int main(void)
{
    printf("size = %d = %d\n", SIZEOF_INT_AS_STR, SIZEOF_INT_AS_STR_1);
    printf("INT_MAX  = %d\n", INT_MAX);
    printf("UINT_MAX = %u\n", UINT_MAX);
    return(0);
}

This doesn't even compile on MacOS X 10.5.8 with GCC 4.0.1 - because the identifier INT_MAX is not defined. A preliminary version of the code that did not print INT_MAX or UINT_MAX worked; it showed that the value of SIZEOF_INT_AS_STR_1 was 31 - so @caf was correct. Adding the double-checking on the values of INT_MAX and UINT_MAX then failed to compile, which did surprise me. A look at the output from gcc -E reveals why:

enum { SIZEOF_INT_AS_STR = sizeof("4294967295,")-1 };
enum { SIZEOF_INT_AS_STR_1 = sizeof("((-INT_MAX - 1)-1 - 100 + 100)" ",")-1 };

int main(void)
{
 printf("size = %d = %d\n", SIZEOF_INT_AS_STR, SIZEOF_INT_AS_STR_1);
 printf("INT_MAX  = %d\n", ((-INT_MAX - 1)-1 - 100 + 100));
 printf("UINT_MAX = %u\n", (((-INT_MAX - 1)-1 - 100 + 100) * 2U + 1U));
 return(0);
}

As predicted, the string for SIZEOF_IN_AS_STR_1 is not a digit string at all. The pre-processor can evaluate the expression (as much as it needs to), but does not have to produce a digit string.

The expansion of INT_MAX turns out to be in terms of INT_MIN, and INT_MIN is, in turn, defined in terms of INT_MAX, so when the rewritten INT_MAX macro is evaluated, the 'recursive expansion' is prevented by the C pre-processor rules of operation, and INT_MAX appears in the pre-processed output - to the confusion of all.

So, there are multiple reasons why it the superficially attractive idea turns out to be a bad idea.


unsigned *a; /* your input a[N] */
unsigned i,N;
char *b,*m;
b=m=malloc(1+N*11); /* for each of N numbers: 10 digits plus comma (plus end of string) */
for (i=0;i<N;++i)
  b+=sprintf(b,"%u,",a[i]);
if (N>0) b[-1]=0; /* delete last trailing comma */
/* now use m */
free(m);

pretty, right? :)


char buf [11 * sizeof (my_list)];
for (int n = 0, int j = 0;  j < sizeof (my_list) / sizeof (my_list [0]);  ++j)
    n += sprintf (&buf [n], "%s%u",   (j > 0) ? "," : "",  my_list [j]);


#include <stdio.h>
#include <stdlib.h>

/* My approach is to count the length of the string required. And do a single alloc.
     Sure you can allocate more, but I don't know for how long this data will be retained.
*/ 

#define LEN(a) (sizeof a / sizeof *a)

int main(void) {

    unsigned a[] = {1, 23, 45, 523, 544};
    int i, str_len=0, t_written=0;
    char tmp[11]; /* enough to fit the biggest unsigned int */

    for(i = 0; i < LEN(a); i++) 
        str_len += sprintf(tmp, "%d", a[i]);

    /* total: we need LEN(a) - 1 more for the ',' and + 1 for '\0' */
    str_len += LEN(a);
    char *str = malloc(str_len); 
    if (!str) 
        return 0;

    if (LEN(a) > 1) {
        t_written += sprintf(str+t_written, "%d", a[0]);
        for(i = 1; i < LEN(a); i++)
            t_written += sprintf(str+t_written, ",%d", a[i]);
    } else if (LEN(a) == 1) 
        t_written += sprintf(str+t_written, "%d", a[0]);

    printf("%s\n", str);

    free(str);
    return 0;
}


You guys and your needless special cases to take care of the trailing comma... It's cheaper just to destroy the last comma then to have a conditional check every time the loop runs.

:)

#include <stdio.h>

char* toStr(int arr[], unsigned int arrSize, char buff[])
{
    if (arr && arrSize && buff)
    {
        int* currInt = arr;
        char* currStr = buff;
        while (currInt < (arr + arrSize))
        {
            currStr += sprintf(currStr, "%d,", *currInt++);
        }
        *--currStr = '\0';
    }
    return buff;
}

int main()
{
    int arr[] = {1234, 421, -125, 15251, 15251, 52};
    char buff[1000];

    printf("Arr is:%s\n", toStr(arr, 6, buff));    
}

Assumes buff is big enough allocate it to be (length of largest int + 2) * arrSize). Inspired my memcpy :)

Edit I realized I had a brainfart earlier and could have just incremented by the return value of sprintf instead of storing temp. Apparently other answers do this as well, edited my answer to remove the 2 needless lines.

Edit2 Looks like wrang-wrang beat me to the punch! His answer is pretty much identical to mine and was submitted earlier. I humbly suggest giving him a + 1.


Assuming that when you mention "for logging" you mean writing to a log file, then your solution could look something like this (pseudocoded):

for (int x in array) {
    fprintf(log, "%d", x);
    if (! last element)
        fputc(log, ',');
}


Personally, for simplicity, and probably speed too, I would malloc a large buffer, with space for an array the size of "4,294,967,295" and ", " for each element. It's not space efficient during the creation of the list though!

Then I'd sprintf the ints into there, and append ", " to all elements

At last I'd realloc the pointer to have no more space than required. (size = strlen)

sprintf: On success, the total number of characters written is returned. This count does not include the additional null-character automatically appended at the end of the string.

That's how you keep track of where strcpy to in the string. :)

Hope that helps! :)

And if you just want to print them out, see the other replies instead. (for-loop and printf)


Unfortunately there will always be three cases:

  • Empty list (no commas, no items)
  • One item (no commas, one item)
  • Two or more items (n-1 commas, n items)

The join method hides this complexity for you, which is why it is so nice.

In C, I'd do:

for (i = 0; i < len; i++)
{
    if (i > 0)   /* You do need this separate check, unfortunately. */
        output(",");
    output(item[i]);
}

Where output is however you're appending to the string. It could be as simple as strcat on a pre-allocated buffer, or a printf to some stream (like a memory stream I learned about today in Creating a FILE * stream that results in a string :-).

If you were annoyed about doing that check every time for all i >= 1, you could do:

if (i > 0)
{
    output(item[0]);
    for (i = 1; i < len; i++)
    {
        output(",");
        output(item[i]);
    }
}


If you want it to a file, Steven Schlansker's answer is fine.

If you want to put it in a string, though, things get more complicated. You can use sprintf, but you need to be careful that you don't run out of room in your string. If you have a C99-compatble snprintf (Linux, BSD, not Windows), the following (untested, uncompiled) code should work:

char *buf = malloc(1024); /* start with 1024 chars */
size_t len = 1024;
int pos = 0;
int rv;
int i;
for (i = 0; i < n; i++) {
    rv = snprintf(buf+pos, len - pos, "%s%d", i = 0 ? "" : ",", my_list[i]);
    if (rv < len - pos) {
        /* it fit */
        pos += rv;
    } else {
        len *= 2;
        buf = realloc(buf, len);
        if (!buf) abort();
        i--; /* decrement i to repeat the last iteration of the loop */
    }
}
return buf;

The caller must then free buf.


void join(int arr[], int len, char* sep, char* result){
    if(len==0){
        *result='\0';
    } else {
        itoa(arr[0],result,10);
        if(len > 1){
            strcat(result,sep);
            join(arr+1,len-1,sep,result+strlen(result));
        }
    }
}


Here's a linear solution that allocates an exponentially increasing buffer (fewer calls to realloc, if that matters) for the caller to free. Test script included.

#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

void ensure(bool pred, char *msg, char *file, int line) {
    if (!pred) {
        fprintf(stderr, "%s:%d: %s", file, line, msg);
        exit(1);
    }
}

char *arr_to_s(int len, int *arr, char *sep) {
    size_t sep_len = strlen(sep);
    int result_capacity = 16 + sep_len;
    int result_len = 0;
    char *result = malloc(result_capacity);
    ensure(result, "malloc", __FILE__, __LINE__);
    result[0] = '\0';
  
    for (int i = 0; i < len; i++) {
        char num[16+sep_len];
        int previous_len = result_len;
        result_len += sprintf(num, i < len - 1 ? "%d%s" : "%d", arr[i], sep);
      
        if (result_len >= result_capacity) {
            result_capacity <<= 1;
            result = realloc(result, result_capacity);
            ensure(result, "realloc", __FILE__, __LINE__);
        }
      
        strcat(result + previous_len, num);
    }
  
    return result;
}

void run_basic_tests(void) {
    int tests[][4] = {
        {0},
        {0, 1},
        {0, 1, 2},
        {0, 42, 2147483647, -2147483648},
    };
    
    for (int i = 0; i < 4; i++) {
        char *s = arr_to_s(i + 1, tests[i], ", ");
        printf("[%s]\n", s);
        free(s);
    }
}

void run_intensive_tests(int n) {
    srand(42);

    for (int i = 0; i < n; i++) {
        int len = rand() % 2000;
        int test[len];

        printf("[");

        for (int j = 0; j < len; j++) {
            test[j] = rand() % 2000000000 - 1000000000;
            printf(j < len - 1 ? "%d," : "%d", test[j]);
        }

        puts("]");
        
        char *s = arr_to_s(len, test, ",");
        printf("[%s]\n", s);
        free(s);
    }
}

int main(void) {
    //run_basic_tests();
    run_intensive_tests(10000);
    return 0;
}

Test runner:

#!/usr/bin/env bash

gcc -std=c99 -pedantic -Wall \
    -Wno-missing-braces -Wextra -Wno-missing-field-initializers -Wformat=2 \
    -Wswitch-default -Wswitch-enum -Wcast-align -Wpointer-arith \
    -Wbad-function-cast -Wstrict-overflow=5 -Wstrict-prototypes -Winline \
    -Wundef -Wnested-externs -Wcast-qual -Wshadow -Wunreachable-code \
    -Wlogical-op -Wfloat-equal -Wstrict-aliasing=2 -Wredundant-decls \
    -Wold-style-definition -Werror \
    -ggdb3 \
    -O0 \
    -fno-omit-frame-pointer -ffloat-store -fno-common -fstrict-aliasing \
    -lm \
    -o arr_to_s.out \
    arr_to_s.c

./arr_to_s.out > arr_to_s_test_out.txt
cat arr_to_s_test_out.txt | awk 'NR % 2 == 1' > arr_to_s_test_expected.txt
cat arr_to_s_test_out.txt | awk 'NR % 2 == 0' > arr_to_s_test_actual.txt
diff arr_to_s_test_expected.txt arr_to_s_test_actual.txt

Valgrind:

==573== HEAP SUMMARY:
==573==     in use at exit: 0 bytes in 0 blocks
==573==   total heap usage: 103,340 allocs, 103,340 frees, 308,215,716 bytes allocated
==573==
==573== All heap blocks were freed -- no leaks are possible
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜