|
|
|
|
!
For
a 15 day free trial click here!
To order now click here! |
|
|
This article describes
common grammar idioms--
the building blocks of your grammar.
|
Writing a grammar is just like programming in a new language. There are certain idioms that you must learn in order to write grammars efficiently. The idioms are the building blocks for your grammar, much as they are in a conventional programming language. This article describes some common idioms that you will use over and over when writing a grammar. In all cases, the label names that we use are the names that SandStone recommends. An Optional Element (Zero or One) This construct models an optional element. The elementOpt nonterminal will be placed in a grammar rule. The following statements will make elementOpt optional. ElementNull elementOpt -> ; Element elementOpt -> element; A List of Zero or More Elements This idiom shows how to make a list of zero or more elements. ElementListNull elementList -> ; ElementList elementList -> elementList element; A List of One or More Elements This is a list of one or more elements. ElementListOne elementList -> element; ElementList elementList -> elementList element; A Statement List This models a statement list. Many file formats can be interpreted as lists of statements, this is how to write them. If each statement ends in some punctuation character (like a ‘;’ in C or a newline character), adding error recovery is greatly simplified, see the grammar below. Start start -> start statementWithEnd; StartStatement start -> statementWithEnd; StatementWithEnd statementWithEnd -> statement punct; StatementError statementWithEnd -> %error punct; Statement1 statement -> ...; Statement2 statement -> ...; . . . Sections When designing a file format, it is sometimes convenient to break the file into sections either explicitly or implicitly. The difference can be explained by looking at two languages. One is the Visual Parse++ rule file. This sections the file explicitly. The sections start with a keyword indicating the format of the statements that can follow, i.e., %macro, %expression, etc. The other is Modula 2. Modula 2 uses keywords to separate the different parts of the program also, but the effect is more subtle than the Visual Parse++ rule files. In both cases, unique tokens indicate the start of some new section in the file. This is a very good practice to get into. The main reason is the 1 lookahead restriction imposed by LALR(1) grammars. Putting separator or punctuation tokens help disambiguate the grammar, and make conflicts much less likely. Sectioning the grammar also makes error recovery easier. Here is how to model sections. Start start -> section1 % section2 % ... Section1 section1 -> section1Token statements1; Section2 section2 -> section2Token statements2; The synchronization tokens (%) will force parsing to proceed at the next section if an error occurs, at a minimum. If you also have some-thing like statement lists in each section, they may also have error recovery. The statements1(2,...) can be modeled as in the previous idiom. So the grammar becomes several sections of statements. Believe it or not, this short list of idioms is enough to write most grammars. |
|