Contents | Prev | Next | Index

CHAPTER 19

LALR(1) Java Grammar

Is there grammar in a title. There is grammar in a title. Thank you.
-Gertrude Stein, "Arthur a Grammar," How to Write (1931)

This chapter presents a grammar for Java. The grammar has been mechanically checked to insure that it is LALR(1).

The grammar for Java presented piecemeal in the preceding chapters is much better for exposition, but it cannot be parsed left-to-right with one token of lookahead because of certain syntactic peculiarities, some of them inherited from C and C++. These problems and the solutions adopted for the LALR(1) grammar are presented below, followed by the grammar itself.

19.1 Grammatical Difficulties

There are five problems with the grammar presented in preceding chapters.

19.1.1 Problem #1: Names Too Specific

Consider the two groups of productions:

PackageName:
Identifier
PackageName . Identifier
TypeName:
Identifier
PackageName . Identifier
and:

MethodName:
Identifier
AmbiguousName . Identifier
AmbiguousName:
Identifier
AmbiguousName . Identifier
Now consider the partial input:

class Problem1 { int m() { hayden.
When the parser is considering the token hayden, with one-token lookahead to symbol ".", it cannot yet tell whether hayden should be a PackageName that qualifies a type name, as in:

hayden.Matrix x, y, z;
or an AmbiguousName that qualifies a method name, as in:

hayden.print("Dinosaur Rex!");
Therefore, the productions shown above result in a grammar that is not LALR(1). There are also other problems with drawing distinctions among different kinds of names in the grammar.

The solution is to eliminate the nonterminals PackageName, TypeName, ExpressionName, MethodName, and AmbiguousName, replacing them all with a single nonterminal Name:

Name:
SimpleName
QualifiedName
SimpleName:
Identifier
QualifiedName:
Name
. Identifier
A later stage of compiler analysis then sorts out the precise role of each name or name qualifier.

For related reasons, these productions in §4.3:

ClassOrInterfaceType:
ClassType
InterfaceType
ClassType:
TypeName
InterfaceType:
TypeName
were changed to:

ClassOrInterfaceType:
Name
ClassType:
ClassOrInterfaceType
InterfaceType:
ClassOrInterfaceType

19.1.2 Problem #2: Modifiers Too Specific

Consider the two groups of productions:

FieldDeclaration:
Field
Modifiersopt Type VariableDeclarators ;
FieldModifiers:
FieldModifier
FieldModifiers FieldModifier
FieldModifier: one of
public protected private
final static transient volatile
and:

MethodHeader:
MethodModifiersopt ResultType MethodDeclarator Throwsopt
MethodModifiers:
MethodModifier
MethodModifiers MethodModifier
MethodModifier: one of
public protected private
static
abstract final native synchronized

Now consider the partial input:

class Problem2 { public static int
When the parser is considering the token static, with one-token lookahead to symbol int-or, worse yet, considering the token public with lookahead to static-it cannot yet tell whether this will be a field declaration such as:

public static int maddie = 0;
or a method declaration such as:

public static int maddie(String art) { return art.length(); }
Therefore, the parser cannot tell with only one-token lookahead whether static (or, similarly, public) should be reduced to FieldModifier or MethodModifier. Therefore, the productions shown above result in a grammar that is not LALR(1). There are also other problems with drawing distinctions among different kinds of modifiers in the grammar.

While not all contexts provoke the problem, the simplest solution is to combine all contexts in which such modifiers are used, eliminating all six of the nonterminals ClassModifiers (§8.1.2), FieldModifiers (§8.3.1), MethodModifiers (§8.4.3), ConstructorModifiers (§8.6.3), InterfaceModifiers (§9.1.2), and ConstantModifiers (§9.3) from the grammar, replacing them all with a single nonterminal Modifiers:

Modifiers:
Modifier
Modifiers
Modifier
Modifier: one of
public protected private
static
abstract final native synchronized transient volatile
A later stage of compiler analysis then sorts out the precise role of each modifier and whether it is permitted in a given context.

19.1.3 Problem #3: Field Declaration versus Method Declaration

Consider the two productions (shown after problem #2 has been corrected):

FieldDeclaration:
Modifiersopt Type VariableDeclarators ;
and:

MethodHeader:
Modifiersopt ResultType MethodDeclarator Throwsopt
where ResultType is defined as:

ResultType:
Type
void
Now consider the partial input:

class Problem3 { int julie
Note that, in this simple example, no Modifiers are present. When the parser is considering the token int, with one-token lookahead to symbol julie, it cannot yet tell whether this will be a field declaration such as:

int julie = 14;
or a method declaration such as:

int julie(String art) { return art.length(); }
Therefore, after the parser reduces int to the nonterminal Type, it cannot tell with only one-token lookahead whether Type should be further reduced to ResultType (for a method declaration) or left alone (for a field declaration). Therefore, the productions shown above result in a grammar that is not LALR(1).

The solution is to eliminate the ResultType production and to have separate alternatives for MethodHeader:

MethodHeader:
Modifiersopt Type MethodDeclarator Throwsopt
Modifiersopt void MethodDeclarator Throwsopt
This allows the parser to reduce int to Type and then leave it as is, delaying the decision as to whether a field declaration or method declaration is in progress.

19.1.4 Problem #4: Array Type versus Array Access

Consider the productions (shown after problem #1 has been corrected):

ArrayType:
Type [ ]
and:

ArrayAccess:
Name [ Expression ]
PrimaryNoNewArray [ Expression ]
Now consider the partial input:

class Problem4 { Problem4() { peter[
When the parser is considering the token peter, with one-token lookahead to symbol [, it cannot yet tell whether peter will be part of a type name, as in:

peter[] team;
or part of an array access, as in:

peter[3] = 12;
Therefore, after the parser reduces peter to the nonterminal Name, it cannot tell with only one-token lookahead whether Name should be reduced ultimately to Type (for an array type) or left alone (for an array access). Therefore, the productions shown above result in a grammar that is not LALR(1).

The solution is to have separate alternatives for ArrayType:

ArrayType:
PrimitiveType [ ]
Name [ ]
ArrayType [ ]
This allows the parser to reduce peter to Name and then leave it as is, delaying the decision as to whether an array type or array access is in progress.

19.1.5 Problem #5: Cast versus Parenthesized Expression

Consider the production:

CastExpression:
( PrimitiveType ) UnaryExpression
( ReferenceType ) UnaryExpressionNotPlusMinus
Now consider the partial input:

class Problem5 { Problem5() { super((matthew)
When the parser is considering the token matthew, with one-token lookahead to symbol ), it cannot yet tell whether (matthew) will be a parenthesized expression, as in:

super((matthew), 9);
or a cast, as in:

super((matthew)baz, 9);
Therefore, after the parser reduces matthew to the nonterminal Name, it cannot tell with only one-token lookahead whether Name should be further reduced to PostfixExpression and ultimately to Expression (for a parenthesized expression) or to ClassOrInterfaceType and then to ReferenceType (for a cast). Therefore, the productions shown above result in a grammar that is not LALR(1).

The solution is to eliminate the use of the nonterminal ReferenceType in the definition of CastExpression, which requires some reworking of both alternatives to avoid other ambiguities:

CastExpression:
( PrimitiveType Dimsopt ) UnaryExpression
( Expression ) UnaryExpressionNotPlusMinus
( Name Dims ) UnaryExpressionNotPlusMinus
This allows the parser to reduce matthew to Expression and then leave it there, delaying the decision as to whether a parenthesized expression or a cast is in progress. Inappropriate variants such as:

(int[])+3
and:

(matthew+1)baz
must then be weeded out and rejected by a later stage of compiler analysis.

The remaining sections of this chapter constitute a LALR(1) grammar for Java syntax, in which the five problems described above have been solved.

19.2 Productions from §2.3: The Syntactic Grammar

Grammar is mistaken at times for burnt ivy with a piece of glass.
-Gertrude Stein, "Arthur a Grammar," How to Write (1931)

Goal:
CompilationUnit

19.3 Productions from §3: Lexical Structure

The question is if you have a vocabulary have you any need of grammar . . .
-Gertrude Stein, "Arthur a Grammar," How to Write (1931)

Literal:
IntegerLiteral
FloatingPointLiteral
BooleanLiteral
CharacterLiteral
StringLiteral
NullLiteral

19.4 Productions from §4: Types, Values, and Variables

Type:
PrimitiveType
ReferenceType
PrimitiveType:
NumericType
boolean
NumericType:
IntegralType
FloatingPointType
IntegralType: one of
byte short int long char
FloatingPointType: one of
float double
ReferenceType:
ClassOrInterfaceType
ArrayType
ClassOrInterfaceType:
Name
ClassType:
ClassOrInterfaceType
InterfaceType:
ClassOrInterfaceType
ArrayType:
PrimitiveType [ ]
Name [ ]
ArrayType [ ]

19.5 Productions from §6: Names

Grammar refers to names as very pretty names.
-Gertrude Stein, "Arthur a Grammar," How to Write (1931)

Name:
SimpleName
QualifiedName
SimpleName:
Identifier
QualifiedName:
Name
. Identifier

19.6 Productions from §7: Packages

CompilationUnit:
PackageDeclarationopt ImportDeclarationsopt TypeDeclarationsopt
ImportDeclarations:
ImportDeclaration
ImportDeclarations ImportDeclaration
TypeDeclarations:
TypeDeclaration
TypeDeclarations TypeDeclaration
PackageDeclaration:
package Name ;
ImportDeclaration:
SingleTypeImportDeclaration
TypeImportOnDemandDeclaration
SingleTypeImportDeclaration:
import Name ;
TypeImportOnDemandDeclaration:
import Name . * ;
TypeDeclaration:
ClassDeclaration
InterfaceDeclaration
;

19.7 Productions Used Only in the LALR(1) Grammar

Modifiers:
Modifier
Modifiers
Modifier
Modifier: one of
public protected private
static
abstract final native synchronized transient volatile

19.8 Productions from §8: Class Declarations

19.8.1 Productions from §8.1: Class Declaration

ClassDeclaration:
Modifiersopt class Identifier Superopt Interfacesopt ClassBody
Super:
extends ClassType
Interfaces:
implements InterfaceTypeList
InterfaceTypeList:
InterfaceType
InterfaceTypeList , InterfaceType
ClassBody:
{ ClassBodyDeclarationsopt }
ClassBodyDeclarations:
ClassBodyDeclaration
ClassBodyDeclarations ClassBodyDeclaration
ClassBodyDeclaration:
ClassMemberDeclaration
StaticInitializer
ConstructorDeclaration
ClassMemberDeclaration:
FieldDeclaration
MethodDeclaration

19.8.2 Productions from §8.3: Field Declarations

FieldDeclaration:
Modifiersopt Type VariableDeclarators ;
VariableDeclarators:
VariableDeclarator
VariableDeclarators , VariableDeclarator
VariableDeclarator:
VariableDeclaratorId
VariableDeclaratorId = VariableInitializer
VariableDeclaratorId:
Identifier
VariableDeclaratorId [ ]
VariableInitializer:
Expression
ArrayInitializer

19.8.3 Productions from §8.4: Method Declarations

MethodDeclaration:
MethodHeader MethodBody
MethodHeader:
Modifiersopt Type MethodDeclarator Throwsopt
Modifiersopt void MethodDeclarator Throwsopt
MethodDeclarator:
Identifier ( FormalParameterListopt )
MethodDeclarator [ ]
FormalParameterList:
FormalParameter
FormalParameterList , FormalParameter
FormalParameter:
Type VariableDeclaratorId
Throws:
throws ClassTypeList
ClassTypeList:
ClassType
ClassTypeList , ClassType
MethodBody:
Block
;

19.8.4 Production from §8.5: Static Initializers

StaticInitializer:
static Block

19.8.5 Productions from §8.6: Constructor Declarations

ConstructorDeclaration:
Modifiersopt ConstructorDeclarator Throwsopt ConstructorBody
ConstructorDeclarator:
SimpleName ( FormalParameterListopt )
ConstructorBody:
{ ExplicitConstructorInvocationopt BlockStatementsopt }
ExplicitConstructorInvocation:
this ( ArgumentListopt ) ;
super ( ArgumentListopt ) ;

19.9 Productions from §9: Interface Declarations

Grammar is useless because there is nothing to say.
-Gertrude Stein, "Arthur a Grammar," How to Write (1931)

19.9.1 Productions from §9.1: Interface Declarations

InterfaceDeclaration:
Modifiersopt interface Identifier ExtendsInterfacesopt InterfaceBody
ExtendsInterfaces:
extends InterfaceType
ExtendsInterfaces , InterfaceType
InterfaceBody:
{ InterfaceMemberDeclarationsopt }
InterfaceMemberDeclarations:
InterfaceMemberDeclaration
InterfaceMemberDeclarations InterfaceMemberDeclaration
InterfaceMemberDeclaration:
ConstantDeclaration
AbstractMethodDeclaration
ConstantDeclaration:
FieldDeclaration
AbstractMethodDeclaration:
MethodHeader ;

19.10 Productions from §10: Arrays

ArrayInitializer:
{ VariableInitializersopt ,opt }
VariableInitializers:
VariableInitializer
VariableInitializers , VariableInitializer

19.11 Productions from §14: Blocks and Statements

Successions of words are so agreeable.
-Gertrude Stein, "Arthur a Grammar," How to Write (1931)

Block:
{ BlockStatementsopt }
BlockStatements:
BlockStatement
BlockStatements BlockStatement
BlockStatement:
LocalVariableDeclarationStatement
Statement
LocalVariableDeclarationStatement:
LocalVariableDeclaration ;
LocalVariableDeclaration:
Type VariableDeclarators
Statement:
StatementWithoutTrailingSubstatement
LabeledStatement
IfThenStatement
IfThenElseStatement
WhileStatement
ForStatement
StatementNoShortIf:
StatementWithoutTrailingSubstatement
LabeledStatementNoShortIf
IfThenElseStatementNoShortIf
WhileStatementNoShortIf
ForStatementNoShortIf
StatementWithoutTrailingSubstatement:
Block
EmptyStatement
ExpressionStatement
SwitchStatement
DoStatement
BreakStatement
ContinueStatement
ReturnStatement
SynchronizedStatement
ThrowStatement
TryStatement
EmptyStatement:
;
LabeledStatement:
Identifier : Statement
LabeledStatementNoShortIf:
Identifier : StatementNoShortIf
ExpressionStatement:
StatementExpression ;
StatementExpression:
Assignment
PreIncrementExpression
PreDecrementExpression
PostIncrementExpression
PostDecrementExpression
MethodInvocation
ClassInstanceCreationExpression

IfThenStatement:
if ( Expression ) Statement
IfThenElseStatement:
if ( Expression ) StatementNoShortIf else Statement
IfThenElseStatementNoShortIf:
if ( Expression ) StatementNoShortIf else StatementNoShortIf
SwitchStatement:
switch ( Expression ) SwitchBlock
SwitchBlock:
{ SwitchBlockStatementGroupsopt SwitchLabelsopt }
SwitchBlockStatementGroups:
SwitchBlockStatementGroup
SwitchBlockStatementGroups SwitchBlockStatementGroup
SwitchBlockStatementGroup:
SwitchLabels BlockStatements
SwitchLabels:
SwitchLabel
SwitchLabels SwitchLabel
SwitchLabel:
case ConstantExpression :
default :
WhileStatement:
while ( Expression ) Statement
WhileStatementNoShortIf:
while ( Expression ) StatementNoShortIf
DoStatement:
do Statement while ( Expression ) ;

ForStatement:
for ( ForInitopt ; Expressionopt ; ForUpdateopt )
Statement
ForStatementNoShortIf:
for ( ForInitopt ; Expressionopt ; ForUpdateopt )
StatementNoShortIf
ForInit:
StatementExpressionList
LocalVariableDeclaration
ForUpdate:
StatementExpressionList
StatementExpressionList:
StatementExpression
StatementExpressionList , StatementExpression
BreakStatement:
break Identifieropt ;
ContinueStatement:
continue Identifieropt ;
ReturnStatement:
return Expressionopt ;
ThrowStatement:
throw Expression ;
SynchronizedStatement:
synchronized ( Expression ) Block
TryStatement:
try Block Catches
try Block Catchesopt Finally
Catches:
CatchClause
Catches CatchClause
CatchClause:
catch ( FormalParameter ) Block
Finally:
finally Block

19.12 Productions from §15: Expressions

Grammar does mean arithmetic.
They act quickly.
Grammar matters if they add quickly.
-Gertrude Stein, "Arthur a Grammar," How to Write (1931)

Primary:
PrimaryNoNewArray
ArrayCreationExpression
PrimaryNoNewArray:
Literal
this
( Expression )
ClassInstanceCreationExpression
FieldAccess
MethodInvocation
ArrayAccess
ClassInstanceCreationExpression:
new ClassType ( ArgumentListopt )
ArgumentList:
Expression
ArgumentList , Expression
ArrayCreationExpression:
new PrimitiveType DimExprs Dimsopt
new ClassOrInterfaceType DimExprs Dimsopt
DimExprs:
DimExpr
DimExprs DimExpr
DimExpr:
[ Expression ]
Dims:
[ ]
Dims [ ]
FieldAccess:
Primary . Identifier
super . Identifier
MethodInvocation:
Name ( ArgumentListopt )
Primary . Identifier ( ArgumentListopt )
super .
Identifier ( ArgumentListopt )
ArrayAccess:
Name [ Expression ]
PrimaryNoNewArray [ Expression ]
PostfixExpression:
Primary
Name
PostIncrementExpression
PostDecrementExpression
PostIncrementExpression:
PostfixExpression ++
PostDecrementExpression:
PostfixExpression --
UnaryExpression:
PreIncrementExpression
PreDecrementExpression
+ UnaryExpression
- UnaryExpression
UnaryExpressionNotPlusMinus
PreIncrementExpression:
++ UnaryExpression
PreDecrementExpression:
-- UnaryExpression
UnaryExpressionNotPlusMinus:
PostfixExpression
~ UnaryExpression
! UnaryExpression
CastExpression
CastExpression:
( PrimitiveType Dimsopt ) UnaryExpression
( Expression ) UnaryExpressionNotPlusMinus
( Name Dims ) UnaryExpressionNotPlusMinus
MultiplicativeExpression:
UnaryExpression
MultiplicativeExpression * UnaryExpression
MultiplicativeExpression / UnaryExpression
MultiplicativeExpression % UnaryExpression
AdditiveExpression:
MultiplicativeExpression
AdditiveExpression + MultiplicativeExpression
AdditiveExpression - MultiplicativeExpression
ShiftExpression:
AdditiveExpression
ShiftExpression << AdditiveExpression
ShiftExpression >> AdditiveExpression
ShiftExpression >>> AdditiveExpression
RelationalExpression:
ShiftExpression
RelationalExpression < ShiftExpression
RelationalExpression > ShiftExpression
RelationalExpression <= ShiftExpression
RelationalExpression >= ShiftExpression
RelationalExpression instanceof ReferenceType
EqualityExpression:
RelationalExpression
EqualityExpression == RelationalExpression
EqualityExpression != RelationalExpression
AndExpression:
EqualityExpression
AndExpression & EqualityExpression
ExclusiveOrExpression:
AndExpression
ExclusiveOrExpression ^ AndExpression
InclusiveOrExpression:
ExclusiveOrExpression
InclusiveOrExpression | ExclusiveOrExpression
ConditionalAndExpression:
InclusiveOrExpression
ConditionalAndExpression && InclusiveOrExpression
ConditionalOrExpression:
ConditionalAndExpression
ConditionalOrExpression || ConditionalAndExpression
ConditionalExpression:
ConditionalOrExpression
ConditionalOrExpression ? Expression : ConditionalExpression
AssignmentExpression:
ConditionalExpression
Assignment
Assignment:
LeftHandSide AssignmentOperator AssignmentExpression
LeftHandSide:
Name
FieldAccess
ArrayAccess
AssignmentOperator: one of
= *= /= %= += -= <<= >>= >>>= &= ^= |=
Expression:
AssignmentExpression
ConstantExpression:
Expression


Contents | Prev | Next | Index

Java API Document (HTML generated by dkramer on July 16, 1996)
Copyright © 1996 Sun Microsystems, Inc. All rights reserved
Please send any comments or corrections to doug.kramer@sun.com