The yacc tool
  
     _________________________________________________________________
                                      
   Most Unix systems, including ours, also supply yacc, a program
   generator that creates a parser -- that is, a program that recognizes
   complex syntactic constructions and performs specified actions when
   such constructions are completed. In a compiler, the action usually
   involves adding a component to a data structure that internally
   represents the structure of a program; in an interpreter, the action
   might be to execute an instruction or evaluate an expression that the
   parser recognizes.
   
   A parser constructed by yacc relies on the presence of a lexical
   analyzer to supply it with tokens; instead of invoking getchar() or
   some similar function to pick up input one character at a time, a
   yacc-derived parser invokes yylex() to pick up input one token at a
   time, and recognizes patterns in the token stream. The lexical
   analyzer can be one that is produced by lex -- lex and yacc were
   designed by the same people and intended to work cooperatively -- or
   it may simply be a function hand-coded by the programmer. In any case,
   it should be implemented as a function named yylex() that returns an
   integer value representing the next token in the input, or zero to
   indicate that the end of the input has been reached, and sets a global
   variable yylval of type int to indicate which particular instance of
   some token-category (``identifier,'' ``constant,'' etc.) was found.
   
   The yacc program allows the programmer to specify the syntactic
   constructions she is looking for by writing grammar rules. A grammar
   rule consists of the name of a syntactic category -- the type of
   syntactic construction that yacc is supposed to recognize -- followed
   by a colon, followed by an analysis, or possible structure, for items
   of that category. The analysis is simply a sequence of zero or more
   category names, token names, and token literals; the idea is that an
   instance of the syntactic category to the left of the colon can
   consist of a substructure for each of the category names to the right
   of the colon, together with specific tokens for each of the token
   names and token literals to the right of the colon.
   
   For instance, in the rule
   
STATEMENT : iftoken '(' EXPRESSION ')' STATEMENT

   STATEMENT and EXPRESSION are category names, iftoken is a token name,
   and '(' and ')' are token literals. The rule says that a statement can
   be made up of an ``if-token'' (presumably a token corresponding to the
   keyword if or some equivalent string of characters), a left
   parenthesis, a substructure of the EXPRESSION category, a right
   parenthesis, and a substructure of the STATEMENT category. Note that
   the analysis can include a reference to the same category that is
   being analyzed, so that syntactic categories can be defined
   recursively. (Of course, this means that if the process is to
   terminate, there must also be some other analysis for the STATEMENT
   category that does not involve recursion.)
   
   The heart of an input file for yacc consists of a sequence of one or
   more such grammar rules. It is common for one syntactic category to
   have two or more analyses, and in such cases the rules are usually
   combined by replacing the left-hand side and colon of all but the
   first analysis with a vertical bar:
   
STATEMENT : iftoken '(' EXPRESSION ')' STATEMENT
          | whiletoken '(' EXPRESSION ')' STATEMENT
          | fortoken '(' EXPRESSION ';' EXPRESSION ';' EXPRESSION ') STATEMENT
          | EXPRESSION ';' ;

   A rule may be accompanied by an action, to be taken when the
   recognition of an instance of the given syntactic category has been
   completed. The action is specified as a compound statement in C,
   enclosed in braces. The action may include invocation of user-defined
   functions, and indeed any kind of operation that can be expressed in
   C. (If no action is provided for a given rule, a semicolon should be
   placed at the end of the last alternative, as in the example just
   shown, to indicate the end of the rule.)
   
   In addition, each of the category names, token names, and token
   literals that occur in a rule has an integer value, and these values
   can be referred to in actions. An action can set the value of the
   syntactic category name that is being analyzed by executing a C
   assignment statement that has the special symbol $$ on its left-hand
   side. ($$ is simply shorthand for ``the value of the category name to
   the left of the colon.) An action can use the values of the names on
   the right-hand side of the rule by placing the special symbols $1, $2,
   $3, $4, ... in the code as if they were expressions of type int; $1
   refers to the value of the first category name, token name, or token
   literal on the right-hand side of the rule, $2 to the second, and so
   on. For instance, in the rule
   
EXPRESSION : '(' EXPRESSION ')' { $$ = $2; }

   the action ensures that the value associated with a parenthesized
   EXPRESSION construction is the same as the value associated with the
   unparenthesized EXPRESSION.
   
   The value of a category name that appears on the right-hand side of
   the rule is presumably set by an action in some other rule. The value
   of a token name or token literal is whatever was contained in the
   global variable yylval at the time the token was released by the
   lexical analyzer.
   
   In addition to grammar rules, a yacc input file may contain
   definitions that specify names for tokens and give precedence and
   associativity information about operators occurring in expressions.
   The most common kinds of definitions begin with one of the following
   indicators (starting in column 1):
   
     * %token -- the names that follow this indicator are understood to
       be token names. If such a name is followed by an integer, that
       integer is understood to be the int that identifies the token;
       otherwise, int values are supplied sequentially, beginning with
       257.
     * %start -- the category name that follows is the ``start symbol''
       for the parser, that is, the syntactic category that describes the
       entire input file (typically, PROGRAM).
     * %left -- indicates that the token name or literal that follows it
       is an operator that associates to the left.
     * %right -- indicates that the token name or literal that follows it
       is an operator that associates to the right.
     * %nonassoc -- indicates that the token name or literal that follows
       it is a non-associative operator that should not occur twice in a
       row within an expression.
       
   To be precise, a yacc input file consists of zero or more definition
   lines, a line containing just the character pair %%, and one or more
   rules. An additional line containing just the character pair %% can be
   appended at the end of the rules; it is optional.
   
   As in lex, one can also insert C code in the definition section of a
   yacc input file by preceding it with a line containing only the
   character pair %{ and following it with a line containing only the
   character pair %}. This is the conventional way of including global
   variable declarations and #include directives.
   
   Similarly, if there is a second %% line in the yacc input file,
   anything that follows it is assumed to be a programmer-supplied
   declaration and is inserted at the end of the output program, again as
   a global declaration.
   
   The parsing function created by yacc is called yyparse(); it takes no
   arguments and returns an integer -- 0 for a successful parse of the
   entire input file, a conventional error code otherwise. The parser can
   be built to stand alone (in which case the parser actions will include
   any output that the parser is to generate), but it is more usual for
   the programmer to create her own main program that invokes yyparse()
   at the appropriate point.
   
   Different implementations of yacc give different names to the output
   file. The most common name is y.tab.c. On our system, however, the
   recommended version of yacc is the one distributed by the Free
   Software Foundation, /usr/local/bin/bison, which chooses the name of
   its output file by replacing the terminal .y in the name of its input
   file with .tab.c.
   
   Here is the sequence of commands one would give to generate a lexical
   analyzer from a lex input file proj.l, a parser from a yacc input file
   proj.y, and then to incorporate the lexical analyzer and parser into a
   program proj.c that invokes yyparse():
   
flex proj.l
gcc -ansi -c lex.yy.c
bison proj.y
gcc -ansi -c proj.tab.c
gcc -ansi -c proj.c
gcc -ansi -o proj proj.o proj.tab.o lex.yy.o -ly -ll

   The -ly option links in the yacc support library, and the -ll option
   links in the lex support library.
   
   An extended example of the use of lex and yacc to construct a compiler
   for a (very small) programming language can be found on MathLAN in the
   directory /u2/stone/courses/software-design/examples/yacc. Read the
   opening comment in the file Strans.c for an overview of the project
   and a definition of the S programming language.
   
     _________________________________________________________________