开发者

How would one implement Lazy Evaluation in C?

Take for example,

The follow python code:

def multiples_of_2():
  i = 0
  while True:
    i = i + 2
    yield i

How do we translate this into C code?

Edit: I am looking to translate this python code into a similar generator in C, with next() function. What I am not looking for is how to create a function in C to output multiples of 2. Multiples of 2 is merely an example to illustrate the problem of lazy eval gener开发者_开发知识库ators in C.


You could try to encapsulate this in a struct:

typedef struct s_generator {
    int current;
    int (*func)(int);
} generator;

int next(generator* gen) {
    int result = gen->current;
    gen->current = (gen->func)(gen->current);
    return result;
}

Then you define you multiples with:

int next_multiple(int current) { return 2 + current; }
generator multiples_of_2 = {0, next_multiple};

You get the next multiple by calling

next(&multiples_of_2);


I found a good article recently on coroutines in C, which describes one method of doing this. It's certainly not for the faint of heart.


As Will mentioned, languages like python do the job of storing the state of the stack between successive calls of the generator. Since C does not have this mechanism, you'll have to do it yourself. The "generic" way of doing this is not for the faint-hearted, as Greg pointed out. The traditional C way of doing this would be for you to define and maintain the state yourself and pass it in and out of your method. So:

struct multiples_of_two_state {
       int i;
       /* all the state you need should go in here */
};

void multiples_of_two_init(struct multiples_of_two_state *s) {
    s->i = 0;
}

int multiples_of_two_next(struct multiples_of_two_state *s) {
    s->i += 2;
    return s->i;
}

/* Usage */
struct multiples_of_two_state s;
int result;
multiples_of_two_init(&s);
for (int i=0; i<INFINITY; i++) {
    result = multiples_of_two_next(&s);
    printf("next is %d", result);
}


The basic approach is to not do it. In Python (and C#) the 'yield' method stores local state between calls, whereas in C/C++ and most other languages the local state stored on the stack is not preserved between calls and this is a fundemental implementation difference. So in C you'd have to store the state between calls in some variable explicitly - either a global variable or a function parameter to your sequence generator. So either:

int multiples_of_2() {
   static int i = 0;
   i += 2;
   return i;
}

or

int multiples_of_2(int i) {
   i += 2;
   return i;
}

depending upon if there's one global sequence or many.

I've quickly considered longjmp and GCC computed gotos and other non-standard things, and I can't say I'd recommend any of them for this! In C, do it the C way.


Check out setjmp/longjmp

setjmp.h is a header defined in the C standard library to provide "non-local jumps," or control flow besides the usual subroutine call and return sequence. The paired functions setjmp and longjmp provide this functionality. First setjmp saves the environment of the calling function into a data structure, and then longjmp can use this structure to "jump" back to the point it was created, at the setjmp call.

(Lua coroutines were implemented that way)


The key is keeping the state of the function between calls. You have a number of options:

  1. Static (or global) state. Means the sequence of calls to the function is not reentrant, i.e. you can't have the function call itself recursively, nor can you have more than one caller running different sequences of calls.

  2. Initialising (and possibly allocating) the state on or before the first call, and passing that to the function on each subsequent call.

  3. Doing clever stuff with setjmp/longjmp, the stack, or modifiable code (there's an article somewhere about currying functions in C that creates an object with the necessary code to call the curried function; a similar technique could create an object with the function's state and the necessary code to save and restore it for each call). (Edit Found it -- http://asg.unige.ch/site/papers/Dami91a.pdf)

Greg cites an interesting article, above, that presents a way of using static state with syntax similar to the yield statement. I liked it academically but probably wouldn't use it because of the reentrancy issue, and because I'm still surprised that the infamous Duffy's Device even compiles ;-).

In practice, large C programs do want to compute things lazily, e.g. a database server may want to satisfy a SELECT ... LIMIT 10 query by wrapping the plain SELECT query inside something that will yield each row until 10 have been returned, rather than computing the whole result and then discarding most of them. The most C-like technique for this is explicitly create an object for the state, and to call a function with it for each call. For your example, you might see something like:

/* Definitions in a library somewhere. */
typedef int M2_STATE;
M2_STATE m2_new() { return 0; }
int m2_empty(M2_STATE s) { return s < INT_MAX; }
int m2_next(M2_STATE s) { int orig_s = s; s = s + 2; return orig_s; }

/* Caller. */
M2_STATE s;
s = m2_new();
while (!m2_empty(s))
{
    int num = m2_next(s);
    printf("%d\n", num);
}

This seems cumbersome for the multiples of two, but it becomes a useful pattern for more complicated generators. You can make the state more complicated without having to burden all the code that uses your generator with the details. Even better practice is to return an opaque pointer in the new function, and (unless GC is available) provide a function for cleaning up the generator.

The big advantage of allocating the state for each new sequence of calls is things like recursive generators. For example, a generator that returns all files under a directory, by calling itself on each subdirectory.

char *walk_next(WALK_STATE *s)
{
    if (s->subgenerator)
    {
        if (walk_is_empty(s->subgenerator))
        {
            walk_finish(s->subgenerator);
            s->subgenerator = NULL;
        }
        else
            return walk_next(s->subgenerator);
    }

    char *name = readdir(s->dir);
    if (is_file(name))
        return name;
    else if (is_dir(name))
    {
        char subpath[MAX_PATH];
        strcpy(subpath, s->path);
        strcat(subpath, name);
        s->subgenerator = walk_new(subpath);
        return walk_next(s->subgenerator);
    }
    closedir(s->dir);
    s->empty = 1;
    return NULL;
}

(You'll have to excuse my misuse of readdir, et al. and my pretense that C has idiot-proof string support.)


You can pass the argument as a pointer to allow the function to modify it without using the return value:

void multiples_of_2(int *i)
{
    *i += 2;
}

And call it:

int i = 0;
multiples_of_2(&i);


int multiples_of_2() {
    static int i = 0;
    i += 2;
    return i;
}

The static int i behaves like a global variable but is visible only within the contect of multiples_of_2().


I have implemented my own lazy eval, with respects to solving the hamming's problem.

Heres my code for anyone whos interested:

#include <stdio.h>
#include <stdlib.h>

// Hamming problem in C.

typedef struct gen {
  int tracker;
  int (*next)(struct gen *g);
} generator;

int twos_gen(struct gen *g) {
  g->tracker = g->tracker + 2;
  return g->tracker;
}

generator* twos_stream() {
  generator *g = malloc(sizeof(generator));
  g->next = twos_gen;
  g->tracker = 0;
  return g;
}

int threes_gen(struct gen *g) {
  g->tracker = g->tracker + 3;
  return g->tracker;
}

generator* threes_stream() {
  generator *g = malloc(sizeof(generator));
  g->next = threes_gen;
  g->tracker = 0;
  return g;
}

int fives_gen(struct gen *g) {
  g->tracker = g->tracker + 5;
  return g->tracker;
}

generator* fives_stream() {
  generator *g = malloc(sizeof(generator));
  g->next = fives_gen;
  g->tracker = 0;
  return g;
}

int smallest(int a, int b, int c) {
  if (a < b) {
    if (c < a) return c;
    return a;
  }
  else {
    if (c < b) return c;
    return b;
  }
}

int hamming_gen(struct gen *g) {
  generator* twos = twos_stream();
  generator* threes = threes_stream();
  generator* fives = fives_stream();

  int c2 = twos->next(twos);
  int c3 = threes->next(threes);
  int c5 = fives->next(fives);

  while (c2 <= g->tracker) c2 = twos->next(twos);
  while (c3 <= g->tracker) c3 = threes->next(threes);
  while (c5 <= g->tracker) c5 = fives->next(fives);

  g->tracker = smallest(c2,c3,c5);
  return g->tracker;
}

generator* hammings_stream() {
  generator *g = malloc(sizeof(generator));
  g->next = hamming_gen;
  g->tracker = 0;
  return g;
}

int main() {
  generator* hammings = hammings_stream();
  int i = 0;
  while (i<10) {
    printf("Hamming No: %d\n",hammings->next(hammings));
    i++;
  }
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜