Algorithm for BASH/CSH/ZSH style brace expansion

2023-03-10 11:31 问答作者：

If I have a string like

a/{b,c,d}/e

then I want to be able to produce this output:

a/b/e
a/c/e
a/d/e

You get the idea. I need to implement this in C. I have written a brute force kind of code which i capable of parsing a single pair of braces (for example: /a/{b,c,d}/e/ but if there are multiple pair of braces, like /a/{b,c}/{d,e}/f in that case my method will break. I would like to take a better approach.

I am not asking directly for the code, just a hint towards an efficient algorithm would b开发者_开发百科e sufficient. I think the task of parsing the braces is repetitive and we could follow a recursive algorithm?

If you're on any kind of Unix, Linux or OS X system, there is a built in library function to do this. man 3 glob will tell you about how to call it from C. Or you can visit http://linux.die.net/man/3/glob to find online documentation.

If you want to roll your own, a simple way to go is to first scan the string and build an intermediate data structure, and then recursively walk that data structure, printing strings. That data structure could be built out of structs with the following fields:

text: pointer to a piece of string
next_node: pointer to what comes after this text when printed
sibling_node: pointer to the next choice that could be made instead of this one

What you're showing here isn't really recursive. If you could nest the brackets, then that would be recursive.

basically what you have is a simple grammar:

thing ::= element { "/" element }*
element ::= symbol || list
list ::= "{" symbol { "," symbol }* "}"
symbol ::= [a-z]+

That's a off the cuff grammar language. * means "zero or more", + means "1 or more". Fairly common.

So, you need a simple tokenizer, something that groups up your symbols and separates out the punctuation mostly.

Then a simple parser

parseThing() {
    Element e = parseElement();
    while (nextToken != null) {
        Slash s = parseSlash();
        e = parseElement():
    }
}

Slash parseSlash() {
    Token t = peekNextToken();
    if (t.getText().equals("/")) {
        return new Slash();
    }
    throw "expected a '/' but got a " + t;
}

Element parseElement() {
    Token t = peekNextToken();
    if (t.isSymbol()) {
        return parseSymbol();
    }
    if (t.isOpenCurly()) {
        return parseList());
    }
    thrown "Syntax error, wanted a symbol or { and got " + t;
}

List parseList() {
    List l = new List();
    Token t = peekNextToken();
    if (t.isOpenCurly()) {
        consumeNextToken();
        Symbol s = parseSymbol();
        l.add(s);
        t = peekNextToken();
        while (t.isComma()) {
            consumeNextToken();
            s = parseSymbol();
            l.add(s);
            t = peekNextToken();
        }
        if (!t.closeCurly()) {
            throw "expected close of list, but got " + t;
        }
        consumeNextToken();
     } else {
         throw "expected start of list but got " + t;
     }
     return l;
}

Symbol parseSymbol() {
    Token t = peekNextToken();

    if(!t.isSymbol()) {
        throw "expected symbol, got " + t;
    }
    consumeNextToken();
    return new Symbol(t);
}

This is incomplete, and high level, but gives you an idea of how you could go about it.

I have been doing something like this recently, and it took me a lot of time to solve this, so here's how I do it. There may be a simpler algorithm for this though.

You can write a recursive descent parser to transform the text into the tree. Make the strings leaf nodes that holds that string and the matched pair of braces an internal node. Each leaf node can contain more than one string.

For example, this:

/a/{b,c}/{d,e{f,g,h}}/i

can become:

(
   ["/a/"]
   {
      ( ["b"] )
      ( ["c"] )
   }
   ["/"]
   {
      ( ["d"] )
      (
         ["e"]
         {
            ( ["f"] )
            ( ["g"] )
            ( ["h"] )
         }
      )
   }
   ["i"]
)

Try to look at it as a tree, where ["stringA", "stringB"] denotes a leaf node, and matched pair of braces represents an internal node. There are 2 types of internal node, one that can choose between one of the alternatives (I use {} in this example) and one that combines all the combination (I use () here).

So, the above tree would go like this:

(
   ["/a/"]
   {
      ["b"]
      ["c"]
   }
   ["/"]
   {
      ["d"]
      (
         ["e"]
         {
            ["f"]
            ["g"]
            ["h"]
         }
      )
   }
   ["i"]
)

then

(
   ["/a/"]
   ["b", "c"]
   ["/"]
   {
      ["d"]
      (
         ["e"]
         ["f", "g", "h"]
      )
   }
   ["i"]
)

then

(
   ["/a/"]
   ["b", "c"]
   ["/"]
   {
      ["d"]
      ["ef", "eg", "eh"]
   }
   ["i"]
)

then

(
   ["/a/"]
   ["b", "c"]
   ["/"]
   ["d", "ef", "eg", "eh"]
   ["i"]
)

and finally, you end up with a single leaf node, which are all the combinations:

["/a/b/di", "/a/b/efi", "/a/b/egi", "/a/b/ehi",
 "/a/c/di", "/a/c/efi", "/a/c/egi", "/a/c/ehi"]

Then you can pretty print it.

Dunno about efficient, but an intuitive way would be to use some form of recursion. The function should be able to find the first brace. Say the first brace contains N alternatives. So the function produces N expansions, and recursively calls itself upon each expansion. Each "fork" keeps on forking till it exhausts every brace.

Does that help?

继续阅读：algorithm brace-expansion c string-parsing

Algorithm for BASH/CSH/ZSH style brace expansion

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生 新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？