parserlib: kyacc : defining the parser

A parser interface and its implementation are generated automatically by m3build by running the command

yacc MyLang. y [ -t MyLang.t [-ti3 MyLangTok.i3] ] [ -o MyLangParse .i3 ]

where MyLang.t is a token specification , MyLangTok.i3 is a token interface , MyLang. y is a parser specification , and MyLangParse .i3 is the generated parser interface.

parser specification

A parser specification is a file with the .y suffix which specifies an LR(1) grammar (LR(1) includes LALR(1) grammars; see LR(1) versus LALR(1) ) for which a parser interface is to be generated. The specification is given as a list of BNF grammar rules . Unlike UNIX Yacc, each reduction rule is associated with a reduction method , which must be given a name. Each line of the file must have one of the following forms:

returnType: Declare grammar symbol returnType , and the Modula-3 type returnType , which will be returned by any of the following reduction methods.
ruleName sym1 'c' Declare a reduction method which reduces sym1 followed by 'x' to the last declared symbol returnType .
 %left sym1
%right ruleName_returnType
%nonassoc sym2 
assign left associativity to any rules containing the symbol sym1 , and assign a higher precedence and right associativity to the rule named ruleName with return type returnType . Assign still higher precedence to any rules containing the symbol sym2 , but still warn of any shift/reduce conflicts.
 %start sym1 sym2 
declare types sym1 and sym2 as subtypes of StartType (see below).

The set of valid grammar symbols consists of whatever tokens were declared in the token interface , plus whatever reduction method return types are declared as above.

The reduction method names have the form ruleName_returnType ; ext expects the methods to have these names. To avoid Modula-3 name conflicts, parse types should not contain underscores.

parser interface

A parser interface is a Modula-3 interface defining a type T representing a parser. Additionally the following types are declared:

StartType <: MyLangTok.ParseType
OtherType <: MyLangTok.ParseType

and all reduction method return types are declared as subtypes of either StartType or OtherType . Hence all ParseType s of importance (i.e. those appearing as parameters in reduction methods) are either StartType s, OtherType s, or MyLangTok.Token s.

In addition to the reduction methods, the parser type T also generically contains the following:

  METHODS
    setLex(lex: MyLangTok.Lexer): T;
    parse(exhaustInput: BOOLEAN := TRUE): StartType;
A generated parser is initialized as often as necessary by calling its setLex method, with a compatible lexer given as an argument. There is no method named init , to allow customized initialization parameters in extended lexers.

If parse is called with exhaustInput = FALSE , then the parser will continue reading tokens until reading another token would cause a syntax error (this may or may not require peeking ahead one token. If peeking is required and the last token would cause an error, it calls lex.unget() ). It returns the StartType representing everything up to just before the error. This feature is useful for parsing a language block whose end is delimited by some token not meaningful in that language, such as an unmatched '}' .

see also

S. C. Johnson, Yacc - Yet Another Compiler Compiler


[ parserlib page ] [ ktok ] [ klex ] [ kyacc ] [ kext ] [ m3build ]

$Id: kyacc.html,v 1.4 2001/01/08 07:08:02 kp Exp $