开发者

Declaring Pascal-style strings in C

In C, is there a good way to define length first, Pascal-style strings as constants, so they can be placed in ROM? (I'm working with a small embedded system with a non-GCC ANSI C compiler).

A C-string is 0 terminated, eg. {'f','o','o',0}.

A Pascal-string has the length in the first byte, eg. {3,'f','o','o'开发者_运维问答}.

I can declare a C-string to be placed in ROM with:

const char *s = "foo";

For a Pascal-string, I could manually specify the length:

const char s[] = {3, 'f', 'o', 'o'};

But, this is awkward. Is there a better way? Perhaps in the preprocessor?


I think the following is a good solution, but don't forget to enable packed structs:

#include <stdio.h>

#define DEFINE_PSTRING(var,str) const struct {unsigned char len; char content[sizeof(str)];} (var) = {sizeof(str)-1, (str)}

DEFINE_PSTRING(x, "foo");
/*  Expands to following:
    const struct {unsigned char len; char content[sizeof("foo")];} x = {sizeof("foo")-1, "foo"};
*/

int main(void)
{
    printf("%d %s\n", x.len, x.content);
    return 0;
}

One catch is, it adds an extra NUL byte after your string, but it can be desirable because then you can use it as a normal c string too. You also need to cast it to whatever type your external library is expecting.


GCC and clang (and possibly others) accept the -fpascal-strings option which allows you to declare pascal-style string literals by having the first thing that appears in the string be a \p, e.g. "\pfoo". Not exactly portable, but certainly nicer than funky macros or the runtime construction of them.

See here for more info.


You can still use a const char * literal and an escape sequence as its first character that indicates the length:

const char *pascal_string = "\x03foo";

It will still be null-terminated, but that probably doesn't matter.


It may sound a little extreme but if you have many strings of this kind that need frequent updating you may consider writing your own small tool (a perl script maybe?) that runs on the host system, parses an input file with a custom format that you can design to your own taste and outputs a .c file. You can integrate it to your makefile or whatever and live happily ever after :)

I'm talking about a program that will convert this input (or another syntax that you prefer):

s = "foo";
x = "My string";

To this output, which is a .c file:

const char s[] = {3, 'f', 'o', 'o'};
const char x[] = {9, 'M', 'y', ' ', 's', 't', 'r', 'i', 'n', 'g'};


My approach would be to create functions for dealing with Pascal strings:

void cstr2pstr(const char *cstr, char *pstr) {
    int i;
    for (i = 0; cstr[i]; i++) {
        pstr[i+1] = cstr[i];
    }
    pstr[0] = i;
}

void pstr2cstr(const char *pstr, char *cstr) {
    int i;
    for (i = 0; i < pstr[0]; i++) {
        cstr[i] = pstr[i+1];
    }
    cstr[i] = 0;
}

Then I could use it this way:

int main(int arg, char *argv[]) {
    char cstr[] = "ABCD", pstr[5], back[5];
    cstr2pstr(cstr, pstr);
    pstr2cstr(pstr, back);
    printf("%s\n", back);
    return 0;
}

This seems to be simple, straightforward, less error prone and not specially awkward. It may be not the solution to your problem, but I would recommend you to at least think about using it.


You can apply sizeof to string literals as well. This allows a little less awkward

const char s[] = {sizeof "foo" - 1u, 'f', 'o', 'o'};

Note that the sizeof a string literal includes the terminating NUL character, which is why you have to subtract 1. But still, it's a lot of typing and obfuscated :-)


One option might be to abuse the preprocessor. By declaring a struct of the right size and populating it on initialization, it can be const.

#define DECLARE_PSTR(id,X) \
    struct pstr_##id { char len; char data[sizeof(X)]; }; \
    static const struct pstr_##id id = {sizeof(X)-1, X};

#define GET_PSTR(id) (const char *)&(id)

#pragma pack(push)
#pragma pack(1) 
DECLARE_PSTR(bob, "foo");
#pragma pack(pop)

int main(int argc, char *argv[])
{
    const char *s = GET_PSTR(bob);
    int len;

    len = *s++;
    printf("len=%d\n", len);
    while(len--)
        putchar(*s++);
    return 0;
} 


This is why flexible array members were introduced in C99 (and to avoid the use of the "struct hack"); IIRC, Pascal-strings were limited to a maximal length of 255.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <limits.h>  // For CHAR_BIT

struct pstring {
    unsigned char len;
    char dat[];
};

struct pstring* pstring_new(char* src, size_t len)
{
    if (!len) {
        len = strlen(src);
    }

    /* if the size does not fit in the ->len field: just truncate ... */
    if (len >= (1u << (CHAR_BIT * sizeof this->len))) {
        len = (1u << (CHAR_BIT * sizeof this->len))-1;
    }

    struct pstring* this = malloc(sizeof *this + len);
    if (!this) {
        return NULL;
    }

    this->len = len;
    memcpy(this->dat, src, len);
    return this;
}

int main(void)
{
    struct pstring* pp = pstring_new("Hello, world!", 0);

    printf("%p:[%u], %*.*s\n", (void*)pp,
           (unsigned int)pp->len,
           (unsigned int)pp->len,
           (unsigned int)pp->len,
           pp->dat);

    return 0;
}


You can define an array in the way you like, but note that this syntax is not adequate:

const char *s = {3, 'f', 'o', 'o'};

You need an array instead of a pointer:

const char s[] = {3, 'f', 'o', 'o'};

Note that a char will only store numbers up to 255 (considering it's not signed) and this will be your maximum string length.

Don't expect this to work where other strings would, however. A C string is expected to terminate with a null character not only by the compiler, but by everything else.


Here's my answer, complete with an append operation that uses alloca() for automatic storage.

#include <stdio.h>
#include <string.h>
#include <alloca.h>

struct pstr {
  unsigned length;
  char *cstr;
};

#define PSTR(x) ((struct pstr){sizeof x - 1, x})

struct pstr pstr_append (struct pstr out,
             const struct pstr a,
             const struct pstr b)
{
  memcpy(out.cstr, a.cstr, a.length); 
  memcpy(out.cstr + a.length, b.cstr, b.length + 1); 
  out.length = a.length + b.length;
  return out;
}

#define PSTR_APPEND(a,b) \
  pstr_append((struct pstr){0, alloca(a.length + b.length + 1)}, a, b)

int main()
{
  struct pstr a = PSTR("Hello, Pascal!");
  struct pstr b = PSTR("I didn't C you there.");

  struct pstr result = PSTR_APPEND(PSTR_APPEND(a, PSTR(" ")), b);

  printf("\"%s\" is %d chars long.\n", result.cstr, result.length);
  return 0;
} 

You could accomplish the same thing using c strings and strlen. Because both alloca and strlen prefer short strings I think that would make more sense.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜