Concurrent parsing algorithms
Are there existing parser algorithms (开发者_运维技巧similar to LALR, SLR and LL) that can parse a single input, not just multiple inputs, in parallel?
Edit: Sorry, I wasn't really looking for research papers, more like, "There are compiler-compilers that generate concurrent parsers" or "This compiler for this language parses it in parallel"- real world examples.
There aren't any well known ones :-}
Much of the reason is the problem is described as parsing a string, presenting to the parser one token a time. That makes the problem sequential by definition, ugh.
One could imagine presenting the array of tokens to some parser all at once, and then have the parser parse substrings at various points across the array, stitching compatible trees for substrings together. The stitching process is likely to be complicated, but might be manageable if driven by an L(AL)R [better, a GLR] parser that swallowed nonterminals left-to-right after most of the parse trees for substrings were produced; think of this an an "accumulator".
[Shades, a quick Google search produces a 1990 Japanese paper on doing parallel GLR with what amounts to parallel Prolog]
You now have the problem of producing the array of tokens magically in parallel. Now you need a parallel lexer :-}
EDIT June 2013: I finally remembered McKeeman's 1982 paper on parallel LR parsing.
I'm working on a deterministic context free parsing algorithm with O(N) work complexity, O(log N) time complexity, fine-grained parallelism that is proportional to the length of the input string, and equivalent behavior to a LR parser. I'll be submitting it for peer review shortly.
The main idea is to process each character in the input stream independently, assume that it could match any rule, then piece together neighboring groups of characters until they unambiguously match a single rule. Once a rule is matched, it is filtered out by the algorithm. After all rules are matched, tokens are gathered together into a sequence.
There is some complexity involved in handling tokens with wildcards that could partially nest, and a post-processing step is needed to handle these and maintain the worst-case O(log N) complexity. This step probably is not needed in practice.
精彩评论