Attribute and Dynamic Scopes

Token attributes

attribute

description

text

 

type

 

line

 

index

 

pos

 

channel

 

tree

 

int

 

Rule attributes

Parsers

attribute

description

text

 

start

 

stop

 

tree

 

st

 

Tree parsers

attribute

description

text

 

start

 

tree

 

st

 

Lexers

attribute

description

text

 

type

 

line

 

index

 

pos

 

channel

 

start

 

stop

 

int

 

The Rule text Attribute in Tree Grammars

In a parser grammar, the relationship between the elements matched by a rule and the associated input text is very clear. A rule begins parsing at a particular token and stops parsing at a particular token. The text attribute for a rule, $text, is simply the concatenated text from all tokens in that range, including hidden channel tokens. What does $text mean in a tree grammar, though?

Tree grammar rules match nodes and trees not tokens. Fortunately, each node has an associated token start and stop index (See TreeAdaptor). As the parser builds trees, each rule sets the token indexes for its return AST to the start and stop token of that rule. We can then define the text attribute for a tree grammar rule to be the text concatenated from the range of tokens indicated by the range in the root of the first tree matched by the rule. This rule may seem strange, but is the most efficient implementation and works in almost all situations. Here are a few examples:

/** match tree created from, e.g., "int x;"
 *  $text would, therefore, be "int x;"
 *  $start node is VAR node.
 */
variable
    :   ^(VAR type ID) // $text derived from indexes in VAR node
    ;
/** match tree node created from, e.g., "int"
 *  $text would, therefore, be "int"
 *  $start node is 'int' node.
 */
type:   'int'          // $text derived from indexes in 'int' node
    |   'void'
    ;

The following code embodies the text attribute definition. The token range from a rule's start node defines the range of text for the entire rule.

// input is a TreeNodeStream implementation
int start = input.getTreeAdaptor().getTokenStartIndex($start);
int stop = input.getTreeAdaptor().getTokenStopIndex($start);
String text = input.getTokenStream().toString(start, stop);

Be careful when referencing the text of a rule that happens to be the root of a tree. The text of a rule is the text of all tokens underneath the first rout matched by the rule. In the following example, rule @r op matches a single node, but $op.text will include the text associated with the two operands as well. The parser that build the plus and multiply operator nodes will set the token range to include all tokens for that expression.

/** match subtrees for + and * created from input such as "1+4*2"
 *  $text and $op.text is "1+4*2" for first alternative.
 *  $text is just the INT node for second alternative.
 */
expr:   ^(op expr expr) ; // $op.text is same as $text!
    |   INT
    ;
op  :   o='+' | o='*'     // $text includes text of operands
    ;                     // $o.text is just node's text

Note that the text for a node label is always just the string returned from getText() invoked on that node whereas the text for a rule reference is always the text for the tree rooted at that labeled node.

Finally, here is the case where the definition of the text attribute does not do what you expect. The text attribute is derived from the first node matched by a rule, but a rule such as rule slist that matches multiple subtrees has an ill-defined text attribute because it only gives you the text for the first statement subtree:

func:   'void' ID '()' slist ; // $slist.text is text from first tree only
slist:  stat+ ;

In general, you just need to keep this in mind--the text attribute is natural in most cases.

Rule scopes

Global shared scopes