+ Reply to Thread
Results 1 to 8 of 8

Thread: Creating A Simple Compiler: Part 4

  1. #1
    Join Date
    Oct 2007
    Location
    /dev/null
    Posts
    4,513
    Blog Entries
    8
    Rep Power
    59

    Creating A Simple Compiler: Part 4

    This is part four of my tutorial series Creating a Simple Compiler. Today we're going to talk about more parsing techniques, starting with some simple arithmetic parsing, and then getting on to operator precedence.

    Previous Tutorial Sections
    Part 1 | Part 2 | Part 3

    Polish Notation
    A parsing tree is a sort of tree used for parsing expressions. Take the expression a + b * c. Because of operator precedence, we have to evaluate b * c before we evaluate a + <result>. Imagine a stack that can hold an arbitrary number of symbols, namely identifiers and operators. If we push an expression symbol by symbol onto the stack as we encounter them (with one little exception), we get what's called Polish notation when we pop the symbols off. The little exception is we have to push the operator after its operands. So b * c is pushed onto the stack as b, c, *. Let's try this with a + b * c:
    Code:
    *  <-- top of stack
    c
    b
    +
    a  <-- bottom of stack
    If we now pop symbols off the tree in order, we get * b c + a. Wonderful...but how do we use this? It's relatively simple: for those of you who know assembly language, you'll realize that there's almost a one-to-one correspondence with instructions and their operands.
    Code:
    MUL     b, c, TEMP1
    ADD     a, TEMP1, TEMP2
    Obviously the actual format depends on the restrictions of the target language you're using. I'm going with Intel x86, so this is completely illegal, but I'm using it to illustrate a point. If you're curious, in proper Intel assembly language (NASM format) this would look something like:
    Code:
    mov     eax, [b]
    mul     [c]
    add     eax, [a]
    (For those of you who are anal-retentive Intel programmers, I'm ignoring the possibility of EAX overflow into EDX for now. If you don't know what this means, don't worry about it.)

    Clearly we have a problem, though: what if we have a * b + c? Our parser is going to produce + b c * a, which is incorrect. It should be * a b + c. Hmm...we could run the expression through a preparser that'd insert parentheses in the proper places, or we could use a parsing tree.

    Parsing Trees
    A parsing tree is, well, a tree. Its nodes can contain operands or operators, and it's used to force precedence. A simple parsing tree for a + b would look like this:
    Code:
        +
       / \
      a   b
    And for our troublesome expression:
    Code:
        +
       / \
      *   c
     / \
    a   b
    Each internal node in the tree is an operator, and each of the leaves are operands. The real question now is: how do we force * to have a higher precedence over + ? Deep waters, Watson.

    Parsing Step-by-Step
    For this we're going to have to get our identifier and the next identifier as well. That is, we read in a and then * first; when we hit * we realize that we're dealing with multiplication. So we create a partial tree:
    Code:
        *
       /
      a
    Not very exciting. Then we hit b +. We know that + has a lower precedence than * so it must go above the multiplication in the tree:
    Code:
        +
       / \
      *
     / \
    a   b
    Clearly we need at least one more operand to complete the tree. We're going to read in c and then the end-of-statement marker (in this case), so we know that our final tree looks like this:
    Code:
        +
       / \
      *   c
     / \
    a   b
    Dealing With Parentheses
    Just to strengthen your grasp of this, let's try something harder: a * ( b + c ) + 100. Hm...we haven't done parentheses before. Never fear, it's not as difficult as it seems.
    First we read in a *. So far so good:
    Code:
        *
       /
      a
    Then we hit the fun part. Next we read in ( b. We see the opening parenthesis, and immediately know that we must force precedence. The way we do it is by inserting some special symbol that doesn't do anything, but merely forces the addition down a level in the tree (i.e. forces higher precedence). Just for fun I'm going to use the dollar sign, since it isn't an operator in our language.
    Code:
        *
       / \
      a   $
    To resume normal operation, we can just read in one more token (the plus sign) and use our original parsing function to generate the next part of the tree:
    Code:
        *
       / \
      a   $
          |
          +
         /
        b
    After that, we read in c ). The closing parenthesis is just a signal to move back up in the parsing tree to the level above the $. Now we have:
    Code:
        *
       / \
      a   $
          |
          +
         / \
        b   c
    Lastly we get + 100 from our input stream, and parse it as usual:
    Code:
          +
         / \
        *   100
       / \
      a   $
          |
          +
         / \
        b   c
    I would show you the basic code for this, but this tutorial is getting really long and I want to finish with a few key concepts, so I'll save that for next time. If you're curious, here's what the Polish notation for our ugly little expression looks like (ignoring $ as a null operation)
    Code:
    + * a + b c 100
    Ew. This is going to be a pain in the butt to turn into assembly code. Never fear, there's an easier way. I didn't teach you about parsing trees for nothing.

    Generating Polish and Reverse-Polish From Parsing Trees
    If you've dealt with binary trees before, you may notice that we can generate Polish notation by traversing the tree in preorder. If you're a little rusty or not all that familiar with binary trees, read the articles I've linked to above.
    Anyway, we want to generate our assembly instructions by drilling down to the bottom of the tree (i.e. the + b c) and working our way back up, generating instructions whenever we hit an operator. For example:
    Code:
        TREE            CODE
          +             add     b, c, TEMP1
         / \
        *   100
       / \
      a   $
          |
          +
         / \
        b   c
    
    Code:
        TREE            CODE
          +             add     b, c, TEMP1
         / \            mul     a, TEMP1, TEMP2
        *   100
       / \
      a   $
          |
          +
         / \
        b   c
    Code:
        TREE            CODE
          +             add     b, c, TEMP1
         / \            mul     a, TEMP1, TEMP2
        *   100         add     TEMP2, 100, TEMP3
       / \
      a   $
          |
          +
         / \
        b   c
    That's it!

    This tutorial has gone on long enough, so I'm going to dedicate a significant portion of the next tutorial showing you how to do all this stuff with real C/C++ code, and then move on to...something. Dunno.

    Stay tuned!
    sudo rm -rf /

  2. CODECALL Circuit advertisement
    Join Date
    Always
    Location
    Advertising world
    Posts
    Many

     
  3. #2
    Join Date
    Jul 2006
    Posts
    16,491
    Blog Entries
    75
    Rep Power
    143

    Re: Creating A Simple Compiler: Part 4

    Well done! +rep

    Gotta love parsing trees as well
    Programming is a branch of mathematics.
    My CodeCall Blog | My Personal Blog

  4. #3
    bekace is offline Newbie
    Join Date
    Jan 2010
    Posts
    13
    Rep Power
    0

    Re: Creating A Simple Compiler: Part 4

    thanks

  5. #4
    Join Date
    Sep 2009
    Location
    USA
    Posts
    3,400
    Blog Entries
    5
    Rep Power
    37

    Re: Creating A Simple Compiler: Part 4



    I really like this series.
    Root Beer == System Administrator's Beer
    Download the new operating system programming kit! (some assembly required)

  6. #5
    arkanion is offline Newbie
    Join Date
    Jan 2010
    Posts
    10
    Rep Power
    0

    Re: Creating A Simple Compiler: Part 4

    good job man thanks

  7. #6
    se7en is offline Banned
    Join Date
    Nov 2009
    Posts
    30
    Rep Power
    0

    Re: Creating A Simple Compiler: Part 4

    PLease continue this tutorial??

  8. #7
    outsid3r's Avatar
    outsid3r is offline Programming God
    Join Date
    Jul 2008
    Posts
    621
    Rep Power
    19

    Re: Creating A Simple Compiler: Part 4

    nice

    +rep

  9. #8
    Join Date
    Oct 2007
    Location
    /dev/null
    Posts
    4,513
    Blog Entries
    8
    Rep Power
    59

    Re: Creating A Simple Compiler: Part 4

    I will eventually, but school's killing me right now.
    sudo rm -rf /

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. Intermediate Creating A Simple Compiler: Part 6
    By dargueta in forum C Tutorials
    Replies: 10
    Last Post: 03-04-2011, 12:02 PM
  2. Creating A Simple Compiler: Part 1
    By dargueta in forum C Tutorials
    Replies: 29
    Last Post: 09-21-2010, 08:08 AM
  3. Creating a Simple Compiler: Part 5
    By dargueta in forum C Tutorials
    Replies: 1
    Last Post: 08-08-2010, 10:58 AM
  4. Creating A Simple Compiler: Part 2
    By dargueta in forum C Tutorials
    Replies: 3
    Last Post: 01-08-2010, 12:25 AM
  5. Creating A Simple Compiler: Part 3
    By dargueta in forum C Tutorials
    Replies: 4
    Last Post: 01-06-2010, 03:57 PM

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts