bisonc++

bisonc++

bisonc++.2.4.7.tar.gz

2005-2009


bisonc++(1)

bisonc++(1)

bisonc++.2.4.7.tar.gz bisonc++ parser generator

2005-2009

NAME

bisonc++ - Generate a C++ parser class and parsing function

SYNOPSIS

bisonc++ [OPTIONS] grammar-file

DESCRIPTION

The program bisonc++ is based on previous work on bison by Alain Coetmeur (coetmeur@icdc.fr), who created in the early '90s a C++ class encapsulating the yyparse() function as generated by the GNU-bison parser generator.

Initial versions of bisonc++ (up to version 0.92) wrapped Alain's program in a program offering a more modern user-interface, removing all old-style (C) %define directives from bison++'s input specification file (see below for an in-depth discussion of the differences between bison++ and bisonc++). Starting with version 0.98, bisonc++ is compiled from a complete rebuilt of the parser generator, closely following the description of Aho, Sethi and Ullman's Dragon Book. Moreover, starting with version 0.98 bisonc++ is now a C++ program, rather than a C program generating C++ code.

Bisonc++ expands the concepts initially implemented in bison and bison++, offering a cleaner setup of the generated parser class. The parser class is derived from a base-class, mainly containing the parser's token- and type-definitions as well as several member functions which should not be (re)defined by the programmer.

Most of these base-class members might also be defined directly in the parser class, but were defined in the parser's base-class. This design results in a very lean parser class, declaring only members that are actually defined by the programmer or that must be defined by bisonc++ itself (e.g., the member function parse() as well as those support functions requiring access to facilities that are only available in the parser class itself, rather than in the parser's base class).

Moreover, this design does not require the use of virtual members: the members which are not involved in the actual parsing process may always be (re)implemented directly by the programmer. Thus there is no need to apply or define virtual member functions.

In fact, there are only two public members in the parser class generated by bisonc++: setDebug() (see below) and parse(). Remaining members are private, and those that can be redefined by the programmer using bisonc++ usually receive initial, very simple default in-line implementations. The (partial) exception to this rule is the member function lex(), producing the next lexical token. For lex() either a standardized interface or a mere declaration is offerered (requiring the programmer to provide a tailor-made implementation for lex()).

To enforce a primitive namespace, bison used a well-known naming-convention: all its public symbols started with yy or YY. Bison++ followed bison in this respect, even though a class by itself offers enough protection of its identifiers. Consequently, the present author feels that these yy and YY conventions are outdated, and consequently bisonc++ does not generate any symbols defined in either the parser (base) class or in the parser function starting with yy or YY. Instead, all data members have names, following a suggestion by Lakos (2001), starting with d_, and all static data members have names starting with s_. This convention was not introduced to enforce identifier protection, but to clarify the storage type of variables. Other (local) symbols lack specific prefixes. Furthermore, bisonc++ allows its users to define the parser class in a particular namespace of their own choice.

Bisonc++ should be used as follows:

GENERATED FILES

Bisonc++ may create the following files:

OPTIONS

If available, single letter options are listed between parentheses following their associated long-option variants. Single letter options require arguments if their associated long options require arguments as well.

DIRECTIVES

The following directives can be used in the initial section of the grammar specification file. When command-line options for directives exist, they overrule the corresponding directives given in the grammar specification file.

PUBLIC MEMBERS AND -TYPES

The following public members can be used by users of the parser classes generated by bisonc++ (`Parser Class':: prefixes are silently implied):

PROTECTED ENUMS AND -TYPES

The following enumerations and types can be used by members of parser classes generated by bisonc++. They are actually protected members inherited from the parser's base class.

PRIVATE MEMBER FUNCTIONS

The following members can be used by members of parser classes generated by bisonc++. When prefixed by Base:: they are actually protected members inherited from the parser's base class. Members for which the phrase ``Used internally'' is used should not be called by user-defined code.

PROTECTED DATA MEMBERS

The following private members can be used by members of parser classes generated by bisonc++. All data members are actually protected members inherited from the parser's base class.

TYPES AND VARIABLES IN THE ANONYMOUS NAMESPACE

In the file defining the parse() function the following types and variables are defined in the anonymous namespace. These are mentioned here for the sake of completeness, and are not normally accessible to other parts of the parser.

RESTRICTIONS ON TOKEN NAMES

To avoid collisions with names defined by the parser's (base) class, the following identifiers should not be used as token nams:

OBSOLETE SYMBOLS

All DECLARATIONS and DEFINE symbols not listed above but defined in bison++ are obsolete with bisonc++. In particular, there is no %header{ ... %} section anymore. Also, all DEFINE symbols related to member functions are now obsolete. There is no need for these symbols anymore as they can simply be declared in the class header file and defined elsewhere.

EXAMPLE

Using a fairly worn-out example, we'll construct a simple calculator below. The basic operators as well as parentheses can be used to specify expressions, and each expression should be terminated by a newline. The program terminates when a q is entered. Empty lines result in a mere prompt.

First an associated grammar is constructed. When a syntactical error is encountered all tokens are skipped until then next newline and a simple message is printed using the default error() function. It is assumed that no semantic errors occur (in particular, no divisions by zero). The grammar is decorated with actions performed when the corresponding grammatical production rule is recognized. The grammar itself is rather standard and straightforward, but note the first part of the specification file, containing various other directives, among which the %scanner directive, resulting in a composed d_scanner object as well as an implementation of the member function int lex(). In this example, a common Scanner class construction strategy was used: the class Scanner was derived from the class yyFlexLexer generated by flex++(1). The actual process of constructing a class using flex++(1) is beyond the scope of this man-page, but flex++(1)'s specification file is mentioned below, to further complete the example. Here is bisonc++'s input file:

%filenames parser
%scanner ../scanner/scanner.h

                                // lowest precedence
%token  NUMBER                  // integral numbers
        EOLN                    // newline

%left   '+' '-' 
%left   '*' '/' 
%right  UNARY
                                // highest precedence 

%%

expressions:
    expressions 
    evaluate
|
    prompt
;

evaluate:
    alternative
    prompt
;

prompt:
    {
        prompt();
    }
;

alternative:
    expression
    EOLN
    {
        cout << $1 << endl;
    }
|
    'q'
    done
|
    EOLN
|
    error
    EOLN
;

done:
    {
        cout << "Done.\n";
        ACCEPT();
    }
;

expression:
    expression 
    '+'
    expression
    {
        $$ = $1 + $3;
    }
|
    expression 
    '-'
    expression
    {
        $$ = $1 - $3;
    }
|
    expression 
    '*'
    expression
    {
        $$ = $1 * $3;
    }
|
    expression 
    '/'
    expression
    {
        $$ = $1 / $3;
    }
|
    '-'             
    expression      %prec UNARY
    {
        $$ = -$2;
    }
|
    '+'             
    expression      %prec UNARY
    {
        $$ = $2;
    }
|
    '('
    expression
    ')'
    {
        $$ = $2;
    }
|
    NUMBER
    {
        $$ = atoi(d_scanner.YYText());
    }
;
Next, bisonc++ processes this file. In the process, bisonc++ generates the following files from its skeletons:

USING PARSER-CLASS SYMBOLS IN LEXICAL SCANNERS

Note here that although the file parserbase.h, defining the parser class' base-class, rather than the header file parser.h defining the parser class is included, the lexical scanner may simply return tokens of the class Calculator (e.g., Calculator::NUMBER rather than CalculatorBase::NUMBER). In fact, using a simple #define - #undef pair generated by the bisonc++ respectively at the end of the base class header the file and just before the definition of the parser class itself it is the possible to assume in the lexical scanner that all symbols defined in the the parser's base class are actually defined in the parser class itself. It the should be noted that this feature can only be used to access base class the enum and types. The actual parser class is not available by the time the the lexical scanner is defined, thus avoiding circular class dependencies.

FILES

SEE ALSO

bison(1), bison++(1), bison.info (using texinfo), flex++(1)

Lakos, J. (2001) Large Scale C++ Software Design, Addison Wesley.
Aho, A.V., Sethi, R., Ullman, J.D. (1986) Compilers, Addison Wesley.

BUGS

To avoid collisions with names defined by the parser's (base) class, the following identifiers should not be used as token nams:

When re-using files generated by bisonc++ before version 2.0.0, minor hand-modification might be necessary. The identifiers in the following list (defined in the parser's base class) now have two underscores affixed to them: LTYPE, STYPE and Tokens. When using classes derived from the generated parser class, the following identifiers are available in such derived classes: DEFAULT_RECOVERY_MODE, ErrorRecovery, Return, UNEXPECTED_TOKEN, d_debug, d_loc, d_lsp, d_nErrors, d_nextToken, d_state, d_token, d_val, and d_vsp. When used in derived classes, they too need two underscores affixed to them.

The member function void lookup (< 1.00) was replaced by int lookup. When regenerating parsers created by early versions of bisonc++ (versions before version 1.00), lookup's prototype should be corrected by hand, since bisonc++ will not by itself rewrite the parser class's header file.

The Semantic parser, mentioned in bison++(1) is not implemented in bisonc++(1). According to bison++(1) the semantic parser was not available in bison++ either. It is possible that the Pure parser is now available via the --thread-safe option.

ABOUT bisonc++

Bisonc++ was based on bison++, originally developed by Alain Coetmeur (coetmeur@icdc.fr), R&D department (RDT), Informatique-CDC, France, who based his work on bison, GNU version 1.21.

Bisonc++ version 0.98 and beyond is a complete rewrite of an LALR-1 parser generator, closely following the construction process as described in Aho, Sethi and Ullman's (1986) book Compilers (i.e., the Dragon book). It the uses same grammar specification as bison and bison++, and it uses practically the same options and directives as bisonc++ versions earlier than 0.98. Variables, declarations and macros that are obsolete were removed. Since bisonc++ is a completely new program, it will most likely contain bugs. Please report bugs to the author:

AUTHOR

Frank B. Brokken (f.b.brokken@rug.nl).