Welcome to the Java Programming Forums


The professional, friendly Java community. 21,500 members and growing!


The Java Programming Forums are a community of Java programmers from all around the World. Our members have a wide range of skills and they all have one thing in common: A passion to learn and code Java. We invite beginner Java programmers right through to Java professionals to post here and share your knowledge. Become a part of the community, help others, expand your knowledge of Java and enjoy talking with like minded people. Registration is quick and best of all free. We look forward to meeting you.


>> REGISTER NOW TO START POSTING


Members have full access to the forums. Advertisements are removed for registered users.

Results 1 to 8 of 8

Thread: Best way to implement an abstract syntax tree?

  1. #1
    Member
    Join Date
    Mar 2011
    Location
    Earth!
    Posts
    77
    Thanks
    2
    Thanked 1 Time in 1 Post

    Default Best way to implement an abstract syntax tree?

    Hi all.

    I am making a small scripting language for fun. Have managed to get the tokenizer working, but now I have another problem. Basically, I am dividing the source code into a series of tokens with the tokenizer and then I am going to use those tokens to build a tree structure representing the code (called an abstract syntax tree). Thing is I am not sure what the best way to implement one of those is. Here is the wikipedia article about it:
    Abstract syntax tree - Wikipedia, the free encyclopedia
    I am aware that it is a rather fuzzy question, but do anyone know the best, or at least a decent, way to do this? Implement an abstract syntax tree, that is?

    Any help would be greatly appriciated .

    Take care,
    Kerr


  2. #2
    Administrator copeg's Avatar
    Join Date
    Oct 2009
    Location
    US
    Posts
    5,320
    Thanks
    181
    Thanked 833 Times in 772 Posts
    Blog Entries
    5

    Default Re: Best way to implement an abstract syntax tree?

    Do you know how to implement a tree data structure? If not, this should be the first place to start. There are lots of tutorials online - here is a descent one that concentrates on binary trees but gives a good overview of some of the fundamental concepts
    Binary Trees

  3. #3
    Member
    Join Date
    Mar 2011
    Location
    Earth!
    Posts
    77
    Thanks
    2
    Thanked 1 Time in 1 Post

    Default Re: Best way to implement an abstract syntax tree?

    Quote Originally Posted by copeg View Post
    Do you know how to implement a tree data structure? If not, this should be the first place to start. There are lots of tutorials online - here is a descent one that concentrates on binary trees but gives a good overview of some of the fundamental concepts
    Binary Trees
    I know how a tree structure works. It is the case of a abstract syntax tree I am unsure of. For example, if I want to implement a function that generates fibonacci numbers, I would maybe do something like this (in my scripting language):
    func fib[num]
        if num < 2
            return num
        end
        return fib[num-1] + fib[num-2]
    end
    What would the abstract syntax tree look like? How would I represent the different parts (like the function definition and the if statement)?

  4. #4
    Administrator copeg's Avatar
    Join Date
    Oct 2009
    Location
    US
    Posts
    5,320
    Thanks
    181
    Thanked 833 Times in 772 Posts
    Blog Entries
    5

    Default Re: Best way to implement an abstract syntax tree?

    You need to first define the relationship between the nodes and what the nodes and their organization represent (code and code structure). You could start by first creating a Node class (containing references to the parent and any children). From there you can define different nodes for different purposes (comparisons, control statements, variables, method calls, etc...) which could in part define how the tree will be traversed. How you design this is up to you (make Node abstract, create an interface that you pass to Node upon instantiation, etc...).

    For the example you posted, the func might define a node with one parent (the parent being a node representing the parameter value) and one child - the child being a comparison Node (evaluates to a boolean). The comparison Node in turn could have 2 child nodes - one of which returns the num value to its parent and the other returns the evaluation of the method call - how these are chose is based upon the evaluation of the boolean.

    My advice would be to start with something more trivial than a recursive method. Start with control structures and variable definitions, then move to method calls and things more complex. This is an excellent problem however, and thanks for asking the question because it has peaked my interest

  5. #5
    Member
    Join Date
    Mar 2011
    Location
    Earth!
    Posts
    77
    Thanks
    2
    Thanked 1 Time in 1 Post

    Default Re: Best way to implement an abstract syntax tree?

    Quote Originally Posted by copeg View Post
    You need to first define the relationship between the nodes and what the nodes and their organization represent (code and code structure). You could start by first creating a Node class (containing references to the parent and any children). From there you can define different nodes for different purposes (comparisons, control statements, variables, method calls, etc...) which could in part define how the tree will be traversed. How you design this is up to you (make Node abstract, create an interface that you pass to Node upon instantiation, etc...).

    For the example you posted, the func might define a node with one parent (the parent being a node representing the parameter value) and one child - the child being a comparison Node (evaluates to a boolean). The comparison Node in turn could have 2 child nodes - one of which returns the num value to its parent and the other returns the evaluation of the method call - how these are chose is based upon the evaluation of the boolean.
    I guess I can just try and keep it simple. Make a base node and then have other nodes for more specific behaviour. Thing is, I want to avoid too many casts and all that when I use the tree later on. Can think of other ways to do it. For example, when I searched on google someone mentioned that the visitor pattern could come in handy, but I am not sure if that is true or not. Another way could be to create one general node class that basically just defines a parent node, a list of child nodes and maybe contains a string representing any other data it may want to hold (like a variable name). Then an enum would be used to identify what kind of a node it is. But I dont know if that is a good idea. Rather tired at the moment, so I cannot think correctly, lol.

    As a note... I have a habit of getting stuck on the problems I see rather then the solutions to those problems. Which is probably a mayor reason to why I have issues figuring this out.

    My advice would be to start with something more trivial than a recursive method. Start with control structures and variable definitions, then move to method calls and things more complex.
    You are probably right. Just so easy to think too large xD.

    This is an excellent problem however, and thanks for asking the question because it has peaked my interest
    I have taken an interest in compilers and such things for some odd reason. It just seems like a very good challenge, and thus, a great way to improve as a programmer. Thats my theory, anyway .

  6. #6
    Super Moderator helloworld922's Avatar
    Join Date
    Jun 2009
    Posts
    2,896
    Thanks
    23
    Thanked 619 Times in 561 Posts
    Blog Entries
    18

    Default Re: Best way to implement an abstract syntax tree?

    I would suggest using ANTLR to create a simple scripting language. ANTLR is a tokenizer and recursive descent parser, perfect for what it sounds like you want to accomplish.

    If you want to write your own parser, you can look at the examples and see how ANTLR produces its AST's (they're not actually binary trees, each node can have any number of children).

    For more information on the method ANTLR uses: Wikipedia - Recursive Descent Parser
    Last edited by helloworld922; January 13th, 2012 at 11:32 AM.

  7. #7
    Member
    Join Date
    Mar 2011
    Location
    Earth!
    Posts
    77
    Thanks
    2
    Thanked 1 Time in 1 Post

    Default Re: Best way to implement an abstract syntax tree?

    Quote Originally Posted by helloworld922 View Post
    I would suggest using ANTLR to create a simple scripting language. ANTLR is a tokenizer and recursive descent parser, perfect for what it sounds like you want to accomplish.

    If you want to right your own parser, you can look at the examples and see how ANTLR produces its AST's (they're not actually binary trees, each node can have any number of children).

    For more information on the method ANTLR uses: Wikipedia - Recursive Descent Parser
    Thanks. I want to write my own parser, but as you suggested I can take a look at how they did it. Quite frankly I think that is a great idea!

  8. #8
    Member
    Join Date
    Mar 2011
    Location
    Earth!
    Posts
    77
    Thanks
    2
    Thanked 1 Time in 1 Post

    Default Re: Best way to implement an abstract syntax tree?

    An update. Have taken a look at ANTLR, and how it does things. So I have decided that I will implement the tree with this class:
    public class Node implements Iterable<Node> {
        private Node parent;
        private NodeType type;
        private String data;
        private int lineNum;
     
        protected List<Node> children;
     
        /**
         * Creates a new node with the given parent node and node type. It also
         * takes the line number the node was on in the source file as an argument
         * to enable easier debugging of scripts.
         * 
         * @param type the type of node
         * @param parent the parent node
         * @param lineNum the line number
         */
        public Node(NodeType type, Node parent, int lineNum) {
            this(type, parent, lineNum, null);
        }
     
        /**
         * Creates a new node instance with the given parent node, node type and
         * data string. It also takes the line number the node was on in the source
         * file as an argument to enable easier debugging of scripts.
         * 
         * @param type the type of node
         * @param parent the parent node
         * @param lineNum the line number
         * @param data the data of the node
         */
        public Node(NodeType type, Node parent, int lineNum, String data) {
            this.parent = parent;
            this.type = type;
            this.lineNum = lineNum;
            this.data = data;
            children = new ArrayList<>();
        }
     
        /**
         * Get the parent node of this node
         * 
         * @return the parent node
         */
        private Node getParent() {
            return parent;
        }
     
        /**
         * Get the node at the given index
         * 
         * @param index the index
         * @return the node at the given nodex
         */
        public Node getChild(int index) {
            return children.get(index);
        }
     
        /**
         * Sets the node at the given index
         * 
         * @param index the index
         * @param n the node
         */
        public void setChild(int index, Node n) {
            children.set(index, n);
        }
     
        /**
         * Adds a new node at the end of the node list
         * 
         * @param n the node to be added
         */
        public void addChild(Node n) {
            children.add(n);
        }
     
        /**
         * Add a collection of child nodes to this node
         * 
         * @param nodes the child nodes to be
         */
        public void addChildren(Collection<Node> nodes) {
            children.addAll(nodes);
        }
     
        /**
         * Removes the node at the given index
         * 
         * @param index 
         */
        public void removeChild(int index) {
            children.remove(index);
        }
     
        /**
         * How many child nodes this node has
         * 
         * @return the amount of child nodes of this node
         */
        public int childCount() {
            return children.size();
        }
     
        /**
         * Return the type of node
         * 
         * @return what type this node is
         */
        public NodeType getType() {
            return type;
        }
     
        /**
         * Return the data contained in the node, or null if none exists
         * 
         * @return the node data
         */
        public String getData() {
            return data;
        }
     
        /**
         * Return the line number of the node
         * 
         * @return the line number
         */
        public int getLineNumber() {
            return lineNum;
        }
     
        /**
         * Returns an iterator over the nodes
         * 
         * @return an iterator over the nodes
         */
        @Override
        public Iterator<Node> iterator() {
            return children.iterator();
        }
     
        /**
         * Creates a new root node
         * 
         * @return the new root node
         */
        public static Node newRootNode() {
            return new Node(NodeType.ROOT, null, 0, null);
        }
    }
    Basically each node will contain a node type, which says what it is representing (a loop, a function, an assingment, etc). Then it will have a string that contains any optional data, like a variable name or a number or something. Any opinions? Is this a good or bad way?

Similar Threads

  1. how to implement timed out event in java?!
    By migongotar in forum Java Theory & Questions
    Replies: 3
    Last Post: December 15th, 2018, 07:41 AM
  2. How do I implement a networking framework?
    By Niator in forum Java Networking
    Replies: 7
    Last Post: December 24th, 2011, 07:22 PM
  3. GUI error: is not abstract and does not override abstract method
    By djl1990 in forum What's Wrong With My Code?
    Replies: 3
    Last Post: October 21st, 2011, 01:26 PM
  4. Data Structures(Binary Search Tree to AVL Tree)ASAP
    By jfAdik in forum Algorithms & Recursion
    Replies: 2
    Last Post: April 5th, 2010, 03:58 AM
  5. Implement dragging a circle
    By jolly in forum AWT / Java Swing
    Replies: 0
    Last Post: August 19th, 2009, 02:48 PM