开发者

What is the difference between LR(0) and SLR parsing?

I am working on my compilers concepts however I am a l开发者_StackOverflowittle confused... Googling got me nowhere to a definite answer.

Is SLR and LR(0) parsers one and same? If not, whats the difference?


Both LR(0) and SLR(1) parsers are bottom-up, directional, predictive parsers. This means that

  • The parsers attempt to apply productions in reverse to reduce the input sentence back to the start symbol (bottom-up)
  • The parsers scan the input from left-to-right (directional)
  • The parsers attempt to predict what reductions to apply without necessarily seeing all of the input (predictive)

Both LR(0) and SLR(1) are shift/reduce parsers, meaning that they process the tokens of the input stream by placing them on a stack, and at each point either shifting a token by pushing it onto the stack or reducing some sequence of terminals and nonterminals atop the stack back to some nonterminal symbol. It can be shown that any grammar can be parsed bottom-up using a shift/reduce parser, but that parser might not be deterministic. That is, the parser may have to "guess" whether to apply a shift or reduction, and may end up having to backtrack to realize that it made the wrong choice. No matter how powerful a deterministic shift/reduce parser you construct, it will never be able to parse all grammars.

When a deterministic shift/reduce parser is used to parse a grammar that it cannot handle, it results in shift/reduce conflicts or reduce/reduce conflicts, where the parser may enter a state in which it cannot tell what action to take. In a shift/reduce conflict, it cannot tell whether it should add another symbol to the stack or perform some reduction on the top symbols of the stack. In a reduce/reduce conflict, the parser knows that it needs to replace the top symbols of the stack with some nonterminal, but it can't tell what reduction to use.

I apologize if this is a lengthy exposition, but we need this to be able to address the difference between LR(0) and SLR(1) parsing. An LR(0) parser is a shift/reduce parser that uses zero tokens of lookahead to determine what action to take (hence the 0). This means that in any configuration of the parser, the parser must have an unambiguous action to choose - either it shifts a specific symbol or applies a specific reduction. If there are ever two or more choices to make, the parser fails and we say that the grammar is not LR(0).

Recall that the two possible LR conflicts are shift/reduce and reduce/reduce. In both of these cases, there are at least two actions that the LR(0) automaton could be taking, and it can't tell which of them to use. Since at least one of the conflicting actions is a reduction, a reasonable line of attack would be to try to have the parser be more careful about when it performs a particular reduction. More specifically, let's suppose that the parser is allowed to look at the next token of input to determine whether it should shift or reduce. If we only allow the parser to reduce when it "makes sense" to do so (for some definition of "makes sense"), then we may be able to eliminate the conflict by having the automaton specifically choose to either shift or reduce in a particular step.

In SLR(1) ("Simplified LR(1)"), the parser is allowed to look at one token of lookahead when deciding whether it should shift or reduce. In particular, when the parser wants to try reducing something of the form A → w (for nonterminal A and string w), it looks at the next token of input. If that token could legally appear after the nonterminal A in some derivation, the parser reduces. Otherwise, it does not. The intuition here is that in some cases it makes no sense to attempt a reduction, because given the tokens we've seen so far and the upcoming token, there is no possible way that the reduction could ever be correct.

The only difference between LR(0) and SLR(1) is this extra ability to help decide what action to take when there are conflicts. Because of this, any grammar that can be parsed by an LR(0) parser can be parsed by an SLR(1) parser. However, SLR(1) parsers can parse a larger number of grammars than LR(0).

In practice, though, SLR(1) is still a fairly weak parsing method. More commonly, you will see LALR(1) ("Lookahead LR(1)") parsers being used. They too work by trying to resolve conflicts in an LR(0) parser, but the rules they use for resolving conflicts are far more precise than those used in SLR(1), and consequently a much larger number of grammars are LALR(1) than are SLR(1). To be a bit more specific, SLR(1) parsers try to resolve conflicts by looking at the structure of the grammar to learn more information about when to shift and when to reduce. LALR(1) parsers look at both the grammar and the LR(0) parser to get even more specific information about when to shift and when to reduce. Because LALR(1) can look at the structure of the LR(0) parser, it can more precisely identify when certain conflicts are spurious. The Linux utilities yacc and bison, by default, produce LALR(1) parsers.

Historically, LALR(1) parsers were typically constructed through a different method that relied on the far more powerful LR(1) parser, so you will often see LALR(1) described that way. To understand this, we need to talk about LR(1) parsers. In an LR(0) parser, the parser works by keeping track of where it might be in the middle of a production. Once it has found that it's reached the end of a production, it knows to try to reduce. However, the parser might not be able to tell whether it's in at the end of one production and the middle of another, which leads to a shift/reduce conflict, or which of two different productions it has reached the end of (a reduce/reduce conflict). In LR(0), this immediately leads to a conflict and the parser fails. In SLR(1) or LALR(1), the parser then makes the decision to shift or reduce based on the next token of lookahead.

In an LR(1) parser, the parser keeps track of additional information as it operates. In addition to keeping track of what production the parser believes is being used, it keeps track of what possible tokens might appear after that production is completed. Because the parser keeps track of this information at each step, and not just when it needs to make the decision, the LR(1) parser is substantially more powerful and precise than any of the LR(0), SLR(1), or LALR(1) parsers we've talked about so far. LR(1) is an extremely powerful parsing technique, and it can be shown using some tricky math that any language that could be parsed deterministically by any shift/reduce parser has some grammar that could be parsed with an LR(1) automaton. (Note that this does not mean that all grammars that can be parsed deterministically are LR(1); this only says that a language that could be parsed deterministically has some LR(1) grammar). However, this power comes at a price, and a generated LR(1) parser may require so much information to operate that it can't possibly be used in practice. An LR(1) parser for a real programming language, for example, might require tens to hundreds of megabytes of additional information to operate correctly. For this reason, LR(1) isn't typically used in practice, and weaker parsers like LALR(1) or SLR(1) are used instead.

More recently, a new parsing algorithm called GLR(0) ("Generalized LR(0)") has gained popularity. Rather than trying to resolve the conflicts that appear in an LR(0) parser, the GLR(0) parser instead works by trying all possible options in parallel. Using some clever tricks, this can be made to run very efficiently for many grammars. Moreover, GLR(0) can parse any context-free grammar at all, even grammars that can't be parsed by an LR(k) parser for any k. Other parsers are capable of doing this as well (for example, the Earley parser or a CYK parser), though GLR(0) tends to be faster in practice.

If you're interested in learning more, over this summer I taught an introductory compilers course and spent just under two weeks talking about parsing techniques. If you'd like to get a more rigorous introduction to LR(0), SLR(1), and a host of other powerful parsing techniques, you might enjoy my lecture slides and homework assignments about parsing. All of the course materials are available here on my personal site.


This is what I have learnt . Usually LR(0) parser can have ambiguity, i.e one box of the table (you derive for creating the parser) can have multiple values (or) to better put it : the parser leads to two final states with the same input. So SLR parser is created to remove this ambiguity. Inorder to construct it find all the productions which lead to goto states , find the follow for the production symbol on the left hand side and only include those goto states which are present in the follow . This inturn means that you dont include a production which is not possible using the original grammer(coz that state is not in the follow set)


In the parsing table for LR(0) , the reduce rule for the production is placed in the entire row, across all the terminals whereas in SLR Parsing table the reduce rule for the production is placed only in the Follow set of left hand side Non-terminal of the reduce production.

The tool called parsing-EMU is very helpful in parsing and can generate first, follow, LR(0) itemset, LALR Evaluation etc. You can find it here.


Adding on top of the above answers, the difference in between the individual parsers in the class of bottom-up parsers is whether they result in shift/reduce or reduce/reduce conflicts when generating the parsing tables. The less it will have the conflicts, the more powerful will be the grammar (LR(0) < SLR(1) < LALR(1) < CLR(1)).

For example, consider the following expression grammar:

E → E + T

E → T

T → F

T → T * F

F → ( E )

F → id

It's not LR(0) but SLR(1). Using the following code, we can construct the LR0 automaton and build the parsing table (we need to augment the grammar, compute the DFA with closure, compute the action and goto sets):

from copy import deepcopy
import pandas as pd

def update_items(I, C):
    if len(I) == 0:
         return C
    for nt in C:
         Int = I.get(nt, [])
         for r in C.get(nt, []):
              if not r in Int:
                  Int.append(r)
          I[nt] = Int
     return I

def compute_action_goto(I, I0, sym, NTs): 
    #I0 = deepcopy(I0)
    I1 = {}
    for NT in I:
        C = {}
        for r in I[NT]:
            r = r.copy()
            ix = r.index('.')
            #if ix == len(r)-1: # reduce step
            if ix >= len(r)-1 or r[ix+1] != sym:
                continue
            r[ix:ix+2] = r[ix:ix+2][::-1]    # read the next symbol sym
            C = compute_closure(r, I0, NTs)
            cnt = C.get(NT, [])
            if not r in cnt:
                cnt.append(r)
            C[NT] = cnt
        I1 = update_items(I1, C)
    return I1

def construct_LR0_automaton(G, NTs, Ts):
    I0 = get_start_state(G, NTs, Ts)
    I = deepcopy(I0)
    queue = [0]
    states2items = {0: I}
    items2states = {str(to_str(I)):0}
    parse_table = {}
    cur = 0
    while len(queue) > 0:
        id = queue.pop(0)
        I = states[id]
        # compute goto set for non-terminals
        for NT in NTs:
            I1 = compute_action_goto(I, I0, NT, NTs) 
            if len(I1) > 0:
                state = str(to_str(I1))
                if not state in statess:
                    cur += 1
                    queue.append(cur)
                    states2items[cur] = I1
                    items2states[state] = cur
                    parse_table[id, NT] = cur
                else:
                    parse_table[id, NT] = items2states[state]
        # compute actions for terminals similarly
        # ... ... ...
                    
    return states2items, items2states, parse_table
        
states, statess, parse_table = construct_LR0_automaton(G, NTs, Ts)

where the grammar G, non-terminal and terminal symbols are defined as below

G = {}
NTs = ['E', 'T', 'F']
Ts = {'+', '*', '(', ')', 'id'}
G['E'] = [['E', '+', 'T'], ['T']]
G['T'] = [['T', '*', 'F'], ['F']]
G['F'] = [['(', 'E', ')'], ['id']]

Here are few more useful function I implemented along with the above ones for LR(0) parsing table generation:

def augment(G, S): # start symbol S
    G[S + '1'] = [[S, '$']]
    NTs.append(S + '1')
    return G, NTs

def compute_closure(r, G, NTs):
    S = {}
    queue = [r]
    seen = []
    while len(queue) > 0:
        r = queue.pop(0)
        seen.append(r)
        ix = r.index('.') + 1
        if ix < len(r) and r[ix] in NTs:
            S[r[ix]] = G[r[ix]]
            for rr in G[r[ix]]:
                if not rr in seen:
                    queue.append(rr)
    return S

The following figure (expand it to view) shows the LR0 DFA constructed for the grammar using the above code:

What is the difference between LR(0) and SLR parsing?

The following table shows the LR(0) parsing table generated as a pandas dataframe, notice that there are couple of shift/reduce conflicts, indicating that the grammar is not LR(0).

What is the difference between LR(0) and SLR parsing?

SLR(1) parser avoids the above shift / reduce conflicts by reducing only if the next input token is a member of the Follow Set of the nonterminal being reduced. So the above grammar is not LR(0), but it's SLR(1). The following parse table is generated by SLR:

What is the difference between LR(0) and SLR parsing?

The following animation shows how an input expression is parsed by the above SLR(1) grammar:

What is the difference between LR(0) and SLR parsing?

But, the following grammar which accepts the strings of the form a^ncb^n, n >= 1 is LR(0):

A → a A b

A → c

S → A

Let's define the grammar as follows:

# S --> A 
# A --> a A b | c
G = {}
NTs = ['S', 'A']
Ts = {'a', 'b', 'c'}
G['S'] = [['A']]
G['A'] = [['a', 'A', 'b'], ['c']]

What is the difference between LR(0) and SLR parsing?

As can be seen from the following figure, there is no conflict in the parsing table generated.

What is the difference between LR(0) and SLR parsing?

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜