+ Reply to Thread
Results 1 to 2 of 2

Thread: Creating a Simple Compiler: Part 5

  1. #1
    Join Date
    Oct 2007
    Location
    /dev/null
    Posts
    4,513
    Blog Entries
    8
    Rep Power
    59

    Creating a Simple Compiler: Part 5

    This is part five of my tutorial series Creating a Simple Compiler. Today we're going to get into a bit about state machines, and how they're going to make our job compiling stuff easier.

    Creating a Language
    First we've got to create our own language. Because this is just a tutorial series, we're going to start out with something really basic, and then move on to more complicated stuff in later tutorials. We'll call this language Eusil. (Extremely USeless Instructional Language.) We'll give it a few basic features and then see what we can do from there.
    Rules:
    -> All programs must start with enter and end with exit. They must be on their own line.
    -> Comments begin with # and continue to the end of the line.
    -> Whitespace is not significant except in string constants.
    -> All statements must end in a semicolon.
    -> There is only one data type, the signed integer. Its keyword is int.
    -> Variables declarations will begin with var followed by the datatype. A variable name may only consist of alphabetic characters. Names are case-sensitive.
    -> Only basic arithmetic operations are allowed, i.e. + - * /. No binary operations.
    -> Other keywords: print and prompt, which are equivalent to castrated versions of printf() and scanf() respectively.

    A sample program:
    Code:
    enter
        # this is just a sample program of our Eusil language.
        
        var int i;
        i = 0;
        i = i + 5;
        i = i / 2;
        
        #
        # Note that prompt takes one string and one variable. It's equivalent to
        #   printf("%s", <prompt-string>);
        #   scanf("%d", &<variable>);
        #
        prompt("Please enter a value.", i);
        i = i + 1;
        print "Your value plus one is: ";
        print i;
        print "\n";
    exit
    Trying to tackle this all at once in one huge function is going to turn into a huge mess that'll make your head explode. Not exactly desirable. If, however, we break this into smaller pieces, it'll be much easier to code and fix. Plus we can change the order of execution and how they call each other as well. How are we going to do this? We can use a state machine as a model.

    Basic State Machines
    Do you remember the context-free grammars we talked about earlier? (If not go back and read the other three parts.) A state machine can be derived from context-free grammars. Think of it as a flowchart:
    Code:
       ----------------------> _ Die
      /                       /
    WakeUp --> Code -----> Tired --> Sleep
    ^          ^   \_ Hungry            |
    |          |        \_ Eat          |
    |          |____________|           |
    |___________________________________/
    At each step we get to make decisions on where to go next, and all together these essentially describe the life of a programmer. We can do the same thing with our language, except it's a bit more complicated. Why don't we start out with the big picture, and work our way down to the details.
    First we have:
    Code:
    Program
    Great and obvious. Let's expand that a bit.
    Code:
    enter --> code --> exit
    Ok...now we're getting somewhere. Each program is made up of one or more statements. So we have...
    Code:
    enter --> (statement)+ --> exit
    Each statement is made up of one or more expressions. Expand that...
    Code:
    enter --> ((expression)+)+ --> exit
    Now it gets ugly. We can have a few kinds of expressions in Eusil:
    1) Comment
    2) Arithmetic operation
    3) Assignment
    4) Variable declaration
    5) Function call

    If we try to expand that:
    Code:
                        enter
                          |
          _____________expression___________
         /                |                 \
         |               _| _               |
         |__Comment____/ /|\  \___Call______|
         |              / | \               |
         |__Arith.____ /  |  \_Var dec. ___/
          \_____Assmt_/   |
                          |
                         exit
    Yikes. Imagine what it'd look like in C++.

    How are we going to do this in our Eusil compiler? We can create functions for each step, and then call them as necesary from the other ones. If you do it right, it'll match your flowchart exactly. Let's get started by converting our flowchart into some functions:
    Code:
    #ifndef __EUSIL_LANGUAGE_H__
    #define __EUSIL_LANGUAGE_H__
    
    int eusil_program();
    
    int eusil_enter();
    
    int eusil_statement_list();
    int eusil_statement();
    
    int eusil_expression_list();
    int eusil_expression();
    
    int eusil_comment();
    int eusil_call();
    int eusil_arithmetic();
    int eusil_assignment();
    int eusil_declaration();
    
    int eusil_exit();
    
    #endif
    If we follow our flow chart, we get something like:
    Code:
    int eusil_program() {
        eusil_enter();
        eusil_statement_list();
        eusil_exit();
    
        return 0;
    }
    
    int eusil_enter() {
        /* Figure out where the "enter" statement in our program is */
    }
    
    int eusil_statement_list() {
    
        while( blah ) {
            ...
            eusil_statement();
            ...
        }
        
        return 0;
    }
    
    int eusil_statement() {
        ...
        eusil_expression_list();
        ...
        return 0;
    }
    
    int eusil_expression_list() {
        while( blah ) {
            ...
            eusil_expression();
            ...
        }
    }
    
    int eusil_expression() {
        /* Figure out the appropriate expression type and call those functions */
    }
    .
    .
    .
    You get the idea. A few other things we need are:
    -> A way to keep track of what variables have been declared.
    -> A way to parse integers, comments, and identifiers (variable names and keywords).
    -> A way to differentiate between those three.

    We need to define some rules for our basic types.
    -> An integer consists of one or more decimal digits 0-9, optionally preceded by a - to indicate a negative number. The context-free grammar form would look something like this:
    Code:
    INTEGER := (-)?([0-9]+)
    (Note that ? means "zero or one" and + means "one or more.")

    A comment is defined as a # followed by anything, up until the end of the line:
    Code:
    COMMENT := #(.*)\n
    . means a single character, and * means "zero or more." So ".*" means "zero or more of any character." If we want to match a literal period or asterisk, we use \. or \* . The problem with this definition is that it allows newlines to be matched as well, so we want to specify everything except a newline. We can do this by using character classes, just like in scanf() :

    Code:
    COMMENT := #([^\n]*)\n
    Better.

    Identifiers--i.e. variable names and keywords--consist of only alphabetic characters. That's an easy one:
    Code:
    IDENTIFIER := [A-Za-z]+
    Arithmetic operators:
    Code:
    OPERATOR := [\*\\+-]
    Note that we need to escape * and \ to force them to be interpreted as literal instead of using their special functions. We also have to put - at the very end or else we're going to accidentally interpret it as a range, like I showed you above in the identifier expression.

    Assignment operator:
    Code:
    ASSIGN := =
    We're going to need to write functions to identify and load each of these. For example, we could do:
    Code:
    int get_identifier(char *buf) {
        return scanf("%[A-Za-z]", buf);
    }
    What if we don't know what's going to come next? Hehehe...you'll find out next time.
    sudo rm -rf /

  2. CODECALL Circuit advertisement
    Join Date
    Always
    Location
    Advertising world
    Posts
    Many

     
  3. #2
    spyder's Avatar
    spyder is offline Programmer
    Join Date
    Jul 2010
    Location
    In your head, doing all your programming for you.
    Posts
    120
    Blog Entries
    2
    Rep Power
    0

    Re: Creating a Simple Compiler: Part 5

    Thanks you! I have been waiting for this part for a while, it ought to keep me going while I wait for the next part! +rep
    Last edited by spyder; 08-08-2010 at 12:41 PM.
    I C!(and Python, and C++, and ...)

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. Intermediate Creating A Simple Compiler: Part 6
    By dargueta in forum C Tutorials
    Replies: 10
    Last Post: 03-04-2011, 12:02 PM
  2. Creating A Simple Compiler: Part 1
    By dargueta in forum C Tutorials
    Replies: 29
    Last Post: 09-21-2010, 08:08 AM
  3. Creating A Simple Compiler: Part 4
    By dargueta in forum C Tutorials
    Replies: 7
    Last Post: 04-19-2010, 04:19 PM
  4. Creating A Simple Compiler: Part 2
    By dargueta in forum C Tutorials
    Replies: 3
    Last Post: 01-08-2010, 12:25 AM
  5. Creating A Simple Compiler: Part 3
    By dargueta in forum C Tutorials
    Replies: 4
    Last Post: 01-06-2010, 03:57 PM

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts