开发者

Algorithm for BASH/CSH/ZSH style brace expansion

If I have a string like

a/{b,c,d}/e

then I want to be able to produce this output:

a/b/e
a/c/e
a/d/e

You get the idea. I need to implement this in C. I have written a brute force kind of code which i capable of parsing a single pair of braces (for example: /a/{b,c,d}/e/ but if there are multiple pair of braces, like /a/{b,c}/{d,e}/f in that case my method will break. I would like to take a better approach.

I am not asking directly for the code, just a hint towards an efficient algorithm would b开发者_开发百科e sufficient. I think the task of parsing the braces is repetitive and we could follow a recursive algorithm?


If you're on any kind of Unix, Linux or OS X system, there is a built in library function to do this. man 3 glob will tell you about how to call it from C. Or you can visit http://linux.die.net/man/3/glob to find online documentation.

If you want to roll your own, a simple way to go is to first scan the string and build an intermediate data structure, and then recursively walk that data structure, printing strings. That data structure could be built out of structs with the following fields:

  • text: pointer to a piece of string
  • next_node: pointer to what comes after this text when printed
  • sibling_node: pointer to the next choice that could be made instead of this one


What you're showing here isn't really recursive. If you could nest the brackets, then that would be recursive.

basically what you have is a simple grammar:

thing ::= element { "/" element }*
element ::= symbol || list
list ::= "{" symbol { "," symbol }* "}"
symbol ::= [a-z]+

That's a off the cuff grammar language. * means "zero or more", + means "1 or more". Fairly common.

So, you need a simple tokenizer, something that groups up your symbols and separates out the punctuation mostly.

Then a simple parser

parseThing() {
    Element e = parseElement();
    while (nextToken != null) {
        Slash s = parseSlash();
        e = parseElement():
    }
}

Slash parseSlash() {
    Token t = peekNextToken();
    if (t.getText().equals("/")) {
        return new Slash();
    }
    throw "expected a '/' but got a " + t;
}

Element parseElement() {
    Token t = peekNextToken();
    if (t.isSymbol()) {
        return parseSymbol();
    }
    if (t.isOpenCurly()) {
        return parseList());
    }
    thrown "Syntax error, wanted a symbol or { and got " + t;
}

List parseList() {
    List l = new List();
    Token t = peekNextToken();
    if (t.isOpenCurly()) {
        consumeNextToken();
        Symbol s = parseSymbol();
        l.add(s);
        t = peekNextToken();
        while (t.isComma()) {
            consumeNextToken();
            s = parseSymbol();
            l.add(s);
            t = peekNextToken();
        }
        if (!t.closeCurly()) {
            throw "expected close of list, but got " + t;
        }
        consumeNextToken();
     } else {
         throw "expected start of list but got " + t;
     }
     return l;
}

Symbol parseSymbol() {
    Token t = peekNextToken();

    if(!t.isSymbol()) {
        throw "expected symbol, got " + t;
    }
    consumeNextToken();
    return new Symbol(t);
}

This is incomplete, and high level, but gives you an idea of how you could go about it.


I have been doing something like this recently, and it took me a lot of time to solve this, so here's how I do it. There may be a simpler algorithm for this though.

You can write a recursive descent parser to transform the text into the tree. Make the strings leaf nodes that holds that string and the matched pair of braces an internal node. Each leaf node can contain more than one string.

For example, this:

/a/{b,c}/{d,e{f,g,h}}/i

can become:

(
   ["/a/"]
   {
      ( ["b"] )
      ( ["c"] )
   }
   ["/"]
   {
      ( ["d"] )
      (
         ["e"]
         {
            ( ["f"] )
            ( ["g"] )
            ( ["h"] )
         }
      )
   }
   ["i"]
)

Try to look at it as a tree, where ["stringA", "stringB"] denotes a leaf node, and matched pair of braces represents an internal node. There are 2 types of internal node, one that can choose between one of the alternatives (I use {} in this example) and one that combines all the combination (I use () here).

So, the above tree would go like this:

(
   ["/a/"]
   {
      ["b"]
      ["c"]
   }
   ["/"]
   {
      ["d"]
      (
         ["e"]
         {
            ["f"]
            ["g"]
            ["h"]
         }
      )
   }
   ["i"]
)

then

(
   ["/a/"]
   ["b", "c"]
   ["/"]
   {
      ["d"]
      (
         ["e"]
         ["f", "g", "h"]
      )
   }
   ["i"]
)

then

(
   ["/a/"]
   ["b", "c"]
   ["/"]
   {
      ["d"]
      ["ef", "eg", "eh"]
   }
   ["i"]
)

then

(
   ["/a/"]
   ["b", "c"]
   ["/"]
   ["d", "ef", "eg", "eh"]
   ["i"]
)

and finally, you end up with a single leaf node, which are all the combinations:

["/a/b/di", "/a/b/efi", "/a/b/egi", "/a/b/ehi",
 "/a/c/di", "/a/c/efi", "/a/c/egi", "/a/c/ehi"]

Then you can pretty print it.


Dunno about efficient, but an intuitive way would be to use some form of recursion. The function should be able to find the first brace. Say the first brace contains N alternatives. So the function produces N expansions, and recursively calls itself upon each expansion. Each "fork" keeps on forking till it exhausts every brace.

Does that help?

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜