Welcome to the Java Programming Forums


The professional, friendly Java community. 21,500 members and growing!


The Java Programming Forums are a community of Java programmers from all around the World. Our members have a wide range of skills and they all have one thing in common: A passion to learn and code Java. We invite beginner Java programmers right through to Java professionals to post here and share your knowledge. Become a part of the community, help others, expand your knowledge of Java and enjoy talking with like minded people. Registration is quick and best of all free. We look forward to meeting you.


>> REGISTER NOW TO START POSTING


Members have full access to the forums. Advertisements are removed for registered users.

Results 1 to 2 of 2

Thread: Java tip Dec 12, 2010: Useful rule tidbits for ANTLR

  1. #1
    Super Moderator helloworld922's Avatar
    Join Date
    Jun 2009
    Posts
    2,896
    Thanks
    23
    Thanked 619 Times in 561 Posts
    Blog Entries
    18

    Default Java tip Dec 12, 2010: Useful rule tidbits for ANTLR

    Introduction

    ANTLR is a tool for generating lexers and parsers for a specific set of languages. It can be used to create programs which need an interpreter, or even to write a compiler for an existing or new language.

    In the short time that I've used ANTLR I can't say I'm ready to provide a full tutorial for ANTLR (nor do I currently have the time). However, someone already has taken their time to provide video tutorials for ANTLR 3 (as well as written tutorials for ANTLR 2). These tutorials focus on using ANTLR, as well as a plugin for Eclipse to allow someone to easily write and test ANTLR generated code.

    Disclaimer: I didn't make these videos, nor have I watched all of them yet. However, I have seen many of them and I believe that they are well done and a valuable source of information.

    Here's a link to these video tutorials: ANTLR 3.x Tutorial videos on Vimeo
    Here are also links to the main ANTLR website: ANTLR Parser Generator

    In this tip I'm posting some useful rules for matching various common literals.

    Difficulty: Medium-hard. The realm of defining a grammar is different to conventional programming, and takes some time to get use to. Also, an in-depth understanding of all the information provided in all 9 of these video tutorials is required if you want to create a complex language. However, if you follow through just a few of these tutorials you can already perform some powerful tasks and these are not too difficult.

    Integer rule

    This rule will match java-style integers entered in decimal notation (base 10). This is slightly different from the rule presented in the video tutorials because a number such as 00001239 in Java technically should be treated as an octal number (base 8). This rule will match a plain 0, or a non-zero digit followed by any digit.

    /**
     * Any integer literal
     */
    IntegerLiteral :
    	('0') | (('1'..'9') Digit+);

    Note: this rule doesn't match negative integers. It's easier to parse the negative as a negation operator and handle negative integers this way.

    Float rule
    Here's a rule for matching Java style floating point number literals. A similar notation is used in other languages. A java float literal can be defined either in decimal notation (1.23), via exponential (1453e-4), or with an f at the end (132f). This notation builds off of the above IntegerLiteral rule which is used to match valid exponents.

    /**
     * Any floating point literal
     */
    FloatLiteral :
    	((('0.' Digit* ) | (('1'..'9') Digit* ('.' Digit*)? )) ('e' '-'? Digit+)?) | (IntegerLiteral 'f')
    	;

    Note: As with the integer rule, this rule won't directly match negative numbers (negative exponents are parsed, though). It's much easier to deal with these in the semantics by matching the negative as the negation operator.

    C-style string and character literals rules

    Here is a rule I've found effective for finding C-style strings. This notation is used by Java, C#, and a wide variety of programming languages. A nearly identical rule can be used for parsing character literals.

    STRING:
    	'"'
    	( ( '\\' ~('\r' | '\n'))
    	| ( ~('\\' | '"' | '\r' | '\n')))+
    	'"'
    	;
     
    CHAR:
    	'\'' 
    	( ( '\\' ~('\r' | '\n'))
    	| ( ~('\\' | '"' | '\r' | '\n')))+
    	'\''
    	;

    Notes: with this rule, I've defined it to accept all characters after an escape character (the backslash) as allowed. However, there is only a limited set of escape characters which are actually allowed. I've found that by accepting all characters after an escape character works better and doing semantic error handling later is a better alternative than hard-coding which escape characters are allowed. Additionally, the character literal rule allows matches to any length character literal. i.e., something like this would result in a positive match:
    'abcdef abd'
    Again, this is something that can be better handled in the semantics rather than in the lexer rule definition.

    Secondly, these rules leave on the surrounding single quotes and double quotes. It's possible to immediately trim these out, and this information is provided in video tutorial #6 (about 1/3 the way in).

    Lastly, I'm not quite sure why these rules work. In my opinion the middle section should require a * (zero or more) rather than a + (one or more). I haven't figures out why this will only work with the +, perhaps at some point in time I will figure this out and update the information here.

  2. The Following User Says Thank You to helloworld922 For This Useful Post:

    JavaPF (December 16th, 2010)


  3. #2
    mmm.. coffee JavaPF's Avatar
    Join Date
    May 2008
    Location
    United Kingdom
    Posts
    3,336
    My Mood
    Mellow
    Thanks
    258
    Thanked 294 Times in 227 Posts
    Blog Entries
    4

    Default Re: Java tip Dec 12, 2010: Useful rule tidbits for ANTLR

    Thanks for these contributions. They are great!
    Soon we will be able to promote these posts to front page articles
    Please use [highlight=Java] code [/highlight] tags when posting your code.
    Forum Tip: Add to peoples reputation by clicking the button on their useful posts.

Similar Threads

  1. Antlr Array Help
    By Superstar288 in forum Java Theory & Questions
    Replies: 0
    Last Post: December 4th, 2010, 01:49 PM
  2. Java Tip Nov 20, 2010 - Spline Interpolation
    By helloworld922 in forum Java Programming Tutorials
    Replies: 0
    Last Post: November 20th, 2010, 12:44 PM
  3. Java tip Aug 26, 2010 - Circular Buffers
    By helloworld922 in forum Java Programming Tutorials
    Replies: 0
    Last Post: August 26th, 2010, 09:33 PM
  4. Java Tip Aug 4, 2010 - How to use File Filters
    By helloworld922 in forum Java Programming Tutorials
    Replies: 0
    Last Post: August 4th, 2010, 05:08 PM
  5. Java Tip Jul 5, 2010 - [Eclipse IDE] Navigating through code
    By helloworld922 in forum Java JDK & IDE Tutorials
    Replies: 1
    Last Post: July 5th, 2010, 06:28 AM