Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Removed superfluous underscores

...

Code Block
startTag  : TAG_START_OPEN GENERIC_ID TAG_CLOSE ;
_

Whereas the GENERIC_ID is the name of that specific start tag. Note that token names are usually all upper case and token parser rule names at least start with a lower case character. Other than that, syntax of the lexer and token parser description is unified. This means the general rule structure and most of the expressions - like ()+ and ()* - are the same both in the lexer as well as in the token parser. Which is good as you only have to learn one language! But wait, our rule isn't quite complete, we forgot about attributes:

Code Block
startTag  : TAG_START_OPEN GENERIC_ID (attribute)* TAG_CLOSE ;

attribute  : GENERIC_ID ATTR_EQ ATTR_VALUE ;
_

You have put the attribute definition into a separate rule to reuse it as a sub rule in the definition for the empty element tag:

Code Block
emptyElement : TAG_START_OPEN GENERIC_ID  (attribute)* TAG_EMPTY_CLOSE ;
_

Finally,  the definition for the end tag, which is easy:

Code Block
endTag :  TAG_END_OPEN GENERIC_ID TAG_CLOSE ;
_

Using this grammar ANTLR can now identify all kinds of tags. This is the complete first version of the grammar:

Code Block
parser  grammar XMLParser;
startTag  : TAG_START_OPEN GENERIC_ID (attribute)* TAG_CLOSE ;
attribute  : GENERIC_ID ATTR_EQ ATTR_VALUE ;
endTag :  TAG_END_OPEN GENERIC_ID TAG_CLOSE ;
emptyElement : TAG_START_OPEN GENERIC_ID  (attribute)* TAG_EMPTY_CLOSE ;
_

Save it to a file - I use xmlParser-version1.g - and run it through ANTLR just like the lexer grammar:

...