Attribute and Dynamic Scopes
Token attributes
attribute |
description |
---|---|
text |
|
type |
|
line |
|
index |
|
pos |
|
channel |
|
tree |
|
int |
|
Rule attributes
Parsers
attribute |
description |
---|---|
text |
|
start |
|
stop |
|
tree |
|
st |
|
Tree parsers
attribute |
description |
---|---|
text |
|
start |
|
tree |
|
st |
|
Lexers
attribute |
description |
---|---|
text |
|
type |
|
line |
|
index |
|
pos |
|
channel |
|
start |
|
stop |
|
int |
|
The Rule text
Attribute in Tree Grammars
In a parser grammar, the relationship between the elements matched by a rule and the associated input text is very clear. A rule begins parsing at a particular token and stops parsing at a particular token. The text attribute for a rule, $text
, is simply the concatenated text from all tokens in that range, including hidden channel tokens. What does $text
mean in a tree grammar, though?
Tree grammar rules match nodes and trees not tokens. Fortunately, each node has an associated token start and stop index (See TreeAdaptor). As the parser builds trees, each rule sets the token indexes for its return AST to the start and stop token of that rule. We can then define the text
attribute for a tree grammar rule to be the text concatenated from the range of tokens indicated by the range in the root of the first tree matched by the rule. This rule may seem strange, but is the most efficient implementation and works in almost all situations. Here are a few examples:
/** match tree created from, e.g., "int x;" * $text would, therefore, be "int x;" * $start node is VAR node. */ variable : ^(VAR type ID) // $text derived from indexes in VAR node ; /** match tree node created from, e.g., "int" * $text would, therefore, be "int" * $start node is 'int' node. */ type: 'int' // $text derived from indexes in 'int' node | 'void' ;
The following code embodies the text
attribute definition. The token range from a rule's start
node defines the range of text for the entire rule.
// input is a TreeNodeStream implementation int start = input.getTreeAdaptor().getTokenStartIndex($start); int stop = input.getTreeAdaptor().getTokenStopIndex($start); String text = input.getTokenStream().toString(start, stop);
Be careful when referencing the text of a rule that happens to be the root of a tree. The text of a rule is the text of all tokens underneath the first rout matched by the rule. In the following example, rule @r op matches a single node, but $op.text
will include the text associated with the two operands as well. The parser that build the plus and multiply operator nodes will set the token range to include all tokens for that expression.
/** match subtrees for + and * created from input such as "1+4*2" * $text and $op.text is "1+4*2" for first alternative. * $text is just the INT node for second alternative. */ expr: ^(op expr expr) ; // $op.text is same as $text! | INT ; op : o='+' | o='*' // $text includes text of operands ; // $o.text is just node's text
Note that the text for a node label is always just the string returned from getText() invoked on that node whereas the text for a rule reference is always the text for the tree rooted at that labeled node.
Finally, here is the case where the definition of the text attribute does not do what you expect. The text attribute is derived from the first node matched by a rule, but a rule such as rule slist that matches multiple subtrees has an ill-defined text attribute because it only gives you the text for the first statement subtree:
func: 'void' ID '()' slist ; // $slist.text is text from first tree only slist: stat+ ;
In general, you just need to keep this in mind--the text attribute is natural in most cases.