Ambiguity Resolution

Next: Tokens Up: Concrete Syntax Previous: Grammar Contents

Ambiguity Resolution

The grammar as given above is ambiguous. We resolve the ambiguity as follows.

The Vesta parser accepts a modified grammar in which the > token is replaced by two distinct tokens: GREATER in the production for Expr4 and RANGLE in the production for List. The modified grammar is unambiguous and can easily be parsed by an LL(1) or LALR(1) automaton.

The Vesta tokenizer is responsible for disambiguating between GREATER and RANGLE wherever > appears in the input. It does so by looking ahead to the next token after the >. If the next token is one of

    - ! ( ERR TRUE FALSE Text Integer Id < [ {

then the > is taken as GREATER; otherwise, it is taken as RANGLE.

Why is this solution reasonable? Inspection of the grammar shows that in a syntactically valid program, the next token after GREATER must be one of those in the list above. The next token after RANGLE must be one of the following:

    : * + ++ - == != < GREATER <= >= && || =>
    ; do , ) then else RANGLE ] % / \ ! (

These sets overlap in the tokens -, !, (, and <. Because we have chosen to resolve these cases as GREATER, it is impossible to write certain syntactically valid programs containing RANGLE. However, any such program can be rewritten by replacing every List nonterminal by ( List ), yielding a semantically equivalent program in which the closing > of the List is correctly resolved as RANGLE. Moreover, we claim (without presenting a proof) that any program in which RANGLE is followed by -, !, (, or < must have a runtime type error, due to the paucity of operators defined on the list type, so in practice such programs are never written.

Next: Tokens Up: Concrete Syntax Previous: Grammar Contents

Allan Heydon, Roy Levin, Timothy Mann, Yuan Yu