The program bisonc++ is based on previous work on bison by Alain Coetmeur (coetmeur@icdc.fr), who created in the early '90s a C++ class encapsulating the yyparse() function as generated by the GNU-bison parser generator.
Initial versions of bisonc++ (up to version 0.92) wrapped Alain's program in a program offering a more modern user-interface, removing all old-style (C) %define directives from bison++'s input specification file (see below for an in-depth discussion of the differences between bison++ and bisonc++). Starting with version 0.98, bisonc++ is compiled from a complete rebuilt of the parser generator, closely following the description of Aho, Sethi and Ullman's Dragon Book. Moreover, starting with version 0.98 bisonc++ is now a C++ program, rather than a C program generating C++ code.
Bisonc++ expands the concepts initially implemented in bison and bison++, offering a cleaner setup of the generated parser class. The parser class is derived from a base-class, mainly containing the parser's token- and type-definitions as well as several member functions which should not be (re)defined by the programmer.
Most of these base-class members might also be defined directly in the parser class, but were defined in the parser's base-class. This design results in a very lean parser class, declaring only members that are actually defined by the programmer or that must be defined by bisonc++ itself (e.g., the member function parse() as well as those support functions requiring access to facilities that are only available in the parser class itself, rather than in the parser's base class).
Moreover, this design does not require the use of virtual members: the members which are not involved in the actual parsing process may always be (re)implemented directly by the programmer. Thus there is no need to apply or define virtual member functions.
In fact, there are only two public members in the parser class generated by bisonc++: setDebug() (see below) and parse(). Remaining members are private, and those that can be redefined by the programmer using bisonc++ usually receive initial, very simple default in-line implementations. The (partial) exception to this rule is the member function lex(), producing the next lexical token. For lex() either a standardized interface or a mere declaration is offerered (requiring the programmer to provide a tailor-made implementation for lex()).
To enforce a primitive namespace, bison used a well-known naming-convention: all its public symbols started with yy or YY. Bison++ followed bison in this respect, even though a class by itself offers enough protection of its identifiers. Consequently, the present author feels that these yy and YY conventions are outdated, and consequently bisonc++ does not generate any symbols defined in either the parser (base) class or in the parser function starting with yy or YY. Instead, all data members have names, following a suggestion by Lakos (2001), starting with d_, and all static data members have names starting with s_. This convention was not introduced to enforce identifier protection, but to clarify the storage type of variables. Other (local) symbols lack specific prefixes. Furthermore, bisonc++ allows its users to define the parser class in a particular namespace of their own choice.
Bisonc++ should be used as follows:
int main() { Parser parser; return parser.parse(); }
Bisonc++ may create the following files:
int lex() { return d_scanner.yylex(); }and an object Scanner d_scanner will be composed into the parser. The d_scanner object will be constructed using its default constructor. If another constructor is required, the parser class may be provided with an appropriate (overloaded) parser constructor after having constructed the default parser class header file using bisonc++. By default header will be surrounded by double quotes (using, e.g., #include "header"). When the argument is surrounded by pointed brackets #include <header> will be included. In the latter case, quotes might be required to escape interpretation by the shell (e.g., using -s '<header>').
The following directives can be used in the initial section of the grammar specification file. When command-line options for directives exist, they overrule the corresponding directives given in the grammar specification file.
struct LTYPE { int timestamp; int first_line; int first_column; int last_line; int last_column; char *text; };
expression: '-' expression %prec UMINUS { ... }Here, the default priority and precedence of the `-' token as the subtraction operator is overruled by the precedence and priority of the UMINUS token, which is commonly defined as
%right UMINUS(see below) following, e.g., the '*' and '/' operators.
The following public members can be used by users of the parser classes generated by bisonc++ (`Parser Class':: prefixes are silently implied):
The following enumerations and types can be used by members of parser classes generated by bisonc++. When prefixed by Base:: they are actually protected members inherited from the parser's base class.
DEFAULT_RECOVERY_MODE, UNEXPECTED_TOKENDEFAULT_RECOVERY_MODE consists of terminating the parsing process. UNEXPECTED_TOKEN activates the recovery procedure whenever an error is encountered. The recovery procedure consists of looking for the first state on the state-stack having an error-production, and then skipping subsequent tokens until (in that state) a token is retrieved which may follow the error terminal token in that production rule. If this error recovery procedure fails (i.e., if no acceptable token is ever encountered) error recovery falls back to the default recovery mode, terminating the parsing process.
PARSE_ACCEPT = 0, PARSE_ABORT = 1(which are of course the parse() function's return values).
The following private members can be used by members of parser classes generated by bisonc++. When prefixed by Base:: they are actually protected members inherited from the parser's base class.
The following private members can be used by members of parser classes generated by bisonc++. All data members are actually protected members inherited from the parser's base class.
d_scanner.setSLoc(&d_loc);Subsequently, the lexical scanner may assign a value to the parser's d_loc variable through the pointer to d_loc stored inside the lexical scanner.
d_scanner.setSval(&d_val);Subsequently, the lexical scanner may assign a value to the parser's d_val variable through the pointer to d_val stored inside the lexical scanner.
In the file defining the parse() function the following types and variables are defined in the anonymous namespace. These are mentioned here for the sake of completeness, and are not normally accessible to other parts of the parser.
_UNDETERMINED_ = -2, _EOF_ = -1, _error_ = 256,These tokens are used by the parser to determine whether another token should be requested from the lexical scanner, and to handle error-conditions.
NORMAL, HAS_ERROR_ITEM, IS_ERROR_STATE,HAS_ERROR_ITEM is used for a state having at least one error-production. IS_ERROR_STATE is used for a state from which error recovery is attempted. So, while in these states tokens are retrieved until a token from where parsing may continue is seen by the parser. All other states are NORMAL states.
All DECLARATIONS and DEFINE symbols not listed above but defined in bison++ are obsolete with bisonc++. In particular, there is no %header{ ... %} section anymore. Also, all DEFINE symbols related to member functions are now obsolete. There is no need for these symbols anymore as they can simply be declared in the class header file and defined elsewhere.
Using a fairly worn-out example, we'll construct a simple calculator below. The basic operators as well as parentheses can be used to specify expressions, and each expression should be terminated by a newline. The program terminates when a q is entered. Empty lines result in a mere prompt.
First an associated grammar is constructed. When a syntactical error is encountered all tokens are skipped until then next newline and a simple message is printed using the default error() function. It is assumed that no semantic errors occur (in particular, no divisions by zero). The grammar is decorated with actions performed when the corresponding grammatical production rule is recognized. The grammar itself is rather standard and straightforward, but note the first part of the specification file, containing various other directives, among which the %scanner directive, resulting in a composed d_scanner object as well as an implementation of the member function int lex(). In this example, a common Scanner class construction strategy was used: the class Scanner was derived from the class yyFlexLexer generated by flex++(1). The actual process of constructing a class using flex++(1) is beyond the scope of this man-page, but flex++(1)'s specification file is mentioned below, to further complete the example. Here is bisonc++'s input file:
%filenames parser %scanner ../scanner/scanner.h // lowest precedence %token NUMBER // integral numbers EOLN // newline %left '+' '-' %left '*' '/' %right UNARY // highest precedence %% expressions: expressions evaluate | prompt ; evaluate: alternative prompt ; prompt: { prompt(); } ; alternative: expression EOLN { cout << $1 << endl; } | 'q' done | EOLN | error EOLN ; done: { cout << "Done.\n"; ACCEPT(); } ; expression: expression '+' expression { $$ = $1 + $3; } | expression '-' expression { $$ = $1 - $3; } | expression '*' expression { $$ = $1 * $3; } | expression '/' expression { $$ = $1 / $3; } | '-' expression %prec UNARY { $$ = -$2; } | '+' expression %prec UNARY { $$ = $2; } | '(' expression ')' { $$ = $2; } | NUMBER { $$ = atoi(d_scanner.YYText()); } ;Next, bisonc++ processes this file. In the process, bisonc++ generates the following files from its skeletons:
#ifndef ParserBase_h_included #define ParserBase_h_included #include <vector> #include <iostream> namespace // anonymous { struct PI; } class ParserBase { public: // $insert tokens // Symbolic tokens: enum Tokens { NUMBER = 257, EOLN, UNARY, }; // $insert STYPE typedef int STYPE; private: int d_stackIdx; std::vector<size_t> d_stateStack; std::vector<STYPE> d_valueStack; protected: enum Return { PARSE_ACCEPT = 0, // values used as parse()'s return values PARSE_ABORT = 1 }; enum ErrorRecovery { DEFAULT_RECOVERY_MODE, UNEXPECTED_TOKEN, }; bool d_debug; size_t d_nErrors; int d_token; int d_nextToken; size_t d_state; STYPE *d_vsp; STYPE d_val; ParserBase(); void ABORT() const throw(Return); void ACCEPT() const throw(Return); void ERROR() const throw(ErrorRecovery); void clearin(); bool debug() const; void pop(size_t count = 1); void push(size_t nextState); void reduce(PI const &productionInfo); size_t top() const; public: void setDebug(bool mode); }; inline bool ParserBase::debug() const { return d_debug; } inline void ParserBase::setDebug(bool mode) { d_debug = mode; } // As a convenience, when including ParserBase.h its symbols are available as // symbols in the class Parser, too. #define Parser ParserBase #endif
Once bisonc++ had created parser.h it was augmented with the required additional members, resulting in the following final version:
#ifndef Parser_h_included #define Parser_h_included // $insert baseclass #include "parserbase.h" // $insert scanner.h #include "../scanner/scanner.h" #undef Parser class Parser: public ParserBase { // $insert scannerobject Scanner d_scanner; public: int parse(); private: void error(char const *msg); // called on (syntax) errors int lex(); // returns the next token from the // lexical scanner. void print(); // use, e.g., d_token, d_loc void prompt(); void done(); // support functions for parse(): void executeAction(int ruleNr); void errorRecovery(); int lookup(bool recovery); void nextToken(); }; inline void Parser::error(char const *msg) { std::cerr << msg << std::endl; } // $insert lex inline int Parser::lex() { return d_scanner.yylex(); } inline void Parser::print() // use d_token, d_loc {} inline void Parser::prompt() { std::cout << "? " << std::flush; } inline void Parser::done() { std::cout << "Done\n"; ACCEPT(); } #endif
%{ #define _SKIP_YYFLEXLEXER_ #include "scanner.ih" #include "../parser/parser.h" %} %option yyclass="Scanner" outfile="yylex.cc" %option c++ 8bit warn noyywrap yylineno %% [ \t]+ // skip white space \n return Parser::EOLN; [0-9]+ return Parser::NUMBER; . return yytext[0]; %%
// include this file in the sources of the class Calculator, // and add any includes etc. that are only needed for // the compilation of these sources. // include the file defining the parser class: #include "parser.h" // UN-comment if you don't want to prefix std:: // for every symbol defined in the std. namespace: using namespace std;The implementation of the parsing member function parse() is basically irrelevant, since it should not be modified by the programmer. It was written on the file parse.cc.
#include "parser/parser.h" int main() { Parser calculator; return calculator.parse(); }
Note here that although the file parserbase.h, defining the parser class' base-class, rather than the header file parser.h defining the parser class is included, the lexical scanner may simply return tokens of the class Calculator (e.g., Calculator::NUMBER rather than CalculatorBase::NUMBER). In fact, using a simple #define - #undef pair generated by the bisonc++ respectively at the end of the base class header the file and just before the definition of the parser class itself it is the possible to assume in the lexical scanner that all symbols defined in the the parser's base class are actually defined in the parser class itself. It the should be noted that this feature can only be used to access base class the enum and types. The actual parser class is not available by the time the the lexical scanner is defined, thus avoiding circular class dependencies.
Lakos, J. (2001) Large Scale C++ Software Design, Addison Wesley.
Aho, A.V., Sethi, R., Ullman, J.D. (1986) Compilers, Addison Wesley.
The member function void lookup (< 1.00) was replaced by int lookup. When regenerating parsers created by early versions of bisonc++ (versions before version 1.00), lookup's prototype should be corrected by hand, since bisonc++ will not by itself rewrite the parser class's header file.
Bisonc++ was based on bison++, originally developed by Alain Coetmeur (coetmeur@icdc.fr), R&D department (RDT), Informatique-CDC, France, who based his work on bison, GNU version 1.21.
Bisonc++ version 0.98 and beyond is a complete rewrite of an LALR-1 parser generator, closely following the construction process as described in Aho, Sethi and Ullman's (1986) book Compilers (i.e., the Dragon book). It the uses same grammar specification as bison and bison++, and it uses practically the same options and directives as bisonc++ versions earlier than 0.98. Variables, declarations and macros that are obsolete were removed. Since bisonc++ is a completely new program, it will most likely contain bugs. Please report bugs to the author:
Frank B. Brokken (f.b.brokken@rug.nl).