A lexer interface and its implementation are generated automatically by m3build by running the command
where MyLang.t is a token specification , MyLangTok.i3 is a token interface , MyLang. l is a lexer specification , and MyLangLex .i3 is the generated lexer interface.
%expr { bare_STRING RegExp1 quoted_STRING RegExp2 } | define the expression methods named bare_STRING and quoted_STRING , which will be called whenever the respective expressions RegExp1 and RegExp2 are matched. See default method construction below. |
METHOD1 RegExp1 | same as above, i.e. the %expr{} is optional. |
%macro { MACRO1 RegExp1 MACRO2 RegExp2 } | define {MACRO1} to stand for RegExp1 , and define {MACRO2} to stand for RegExp2 . |
%macro MACRO RegExp | alternate syntax, same meaning as above. |
The default method named char returns a constant token whose value equals the character code of the first matched character. If a method name is not char and does not have a suffix matching a token name, the default method returns NIL (instructing the lexer to skip the token) and a warning is printed. The warning is not printed, however, if the method is named skip ; in that case skipping is assumed to be the desired default behavior.
In addition it is customary to define a token named ERROR , which does not ordinarily match any grammar rules. Thus a lexer specification will typically end with the following 3 lines:
char {%char} skip [ \t]* ERROR [^]The behavior of any default method can be changed by overriding the method, for example using ext .
x | match the character 'x' |
. | any character (byte) except newline |
[xyz] | a "character class"; in this case, the pattern matches either an 'x' , a 'x' , or a 'x' |
[abj-oZ] | a "character class" with a range in it; matches an 'x' , a 'x' , any letter from 'x' through 'x' , or a 'x' |
[^A-Z] | a "negated character class", i.e., any character but those in the class. In this case, any character EXCEPT an uppercase letter. |
[^A-Z\n] | any character EXCEPT an uppercase letter or a newline |
r* | zero or more r 's , where r is any regular expression |
r+ | one or more r 's |
r? | zero or one r 's (that is, an optional r ) |
r{2 , 5} | anywhere from two to five r 's |
r{2 , } | two or more r 's |
r{4} | exactly 4 r 's |
{NAME} | the expansion of macro NAME |
{%char} | the %char macro expands to the class of characters which were declared %char in the token interface . |
"[xyz]\"foo" | the literal string: [xyz]"foo |
\X | if X is an 'x' or 'x' , then the ANSI-C interpretation of \x . Otherwise, a literal 'x' (used to escape operators such as 'x' ) |
\123 | the character with octal value 123 |
(r) | match an r; parentheses are used to override precedence |
rs | the regular expression r followed by the regular expression s; called "concatenation" |
r|s | either an r or an s |
The type T representing a lexer is declared as an opaque subtype of the RdLexer instantiated in the token interface. Hence the following uses are possible:
myLexer := NEW(MyLangLex.T).setRd(rd); |
Initialize the new lexer myLexer using the reader rd: Rd.T . |
  |
start := myParser.parse(NEW(MylangLex.T).fromText(text)); |
Parse text: TEXT , using a new lexer and myParser . The interface which was used to initialize myParser must be compatible with MyLangLex.i3 . |
There is no method named init , to allow customized initialization parameters in extended lexers.
Vern Paxson et. al., flex - fast lexical analyzer generator
A. Aho, R. Sethi and J. Ullman, Compilers: Principles, Techniques and Tools
$Id: klex.html,v 1.3 2001/01/08 03:25:31 kp Exp $