+ Reply to Thread
Results 1 to 4 of 4

Thread: Creating A Simple Compiler: Part 2

  1. #1
    Join Date
    Oct 2007
    Location
    /dev/null
    Posts
    4,513
    Blog Entries
    8
    Rep Power
    59

    Creating A Simple Compiler: Part 2

    Hello, and welcome to part two of my tutorial series, Creating a Simple Compiler. In this part we're going to fully flesh out our CFG for Nano, and get started writing a few parsing utilities.

    Creating A Simple Compiler - Part 1

    A Full Grammar For Nano
    In the interest of time, I've written out the entire CFG for the Nano language. Take a few minutes to study it and figure out what's going on; try doing a few resolutions yourself if it helps. Pay particular attention to the rules for EXPRESSION, ASSIGNMENT, and STATEMENT, as they're some of the more important ones.
    Code:
    <DIGIT>     ::=  '0'..'9'
    <NUMBER>    ::=  <DIGIT>+
    
    <ANYCHAR>   ::=  '\x00' .. '\xFF'
    <COMMENT>   ::=  "/*"  <ANYCHAR>*  "*/"
    
    <LETTER>    ::=  'A'..'Z' | 'a'..'z'
    <IDCHAR>    ::=  <LETTER> | '_'
    <VARIABLE>  ::=  <IDCHAR>+ (<DIGIT> | <IDCHAR>)*
    
    <BIOP>      ::=  '+' | '-' | '*' | '/' | '%'
    <OPERATION> ::=  (<VARIABLE> | <NUMBER>) <BIOP> (<VARIABLE> | <NUMBER>)
    <EXPRESSION>::=  <OPERATION> (<BIOP> <OPERATION>)*
    <ASSIGNMENT>::=  <VARIABLE> '=' <EXPRESSION>
    <VARLIST>   ::=  <VARIABLE> (',' <VARIABLE>)*
    <DECLARATION>::=  "var" <VARLIST>
    <STATEMENT>   ::=  ( <ASSIGNMENT> | <DECLARATION> | ("output" <VARLIST>) | ("input" <VARLIST>) ) ';'
    So now that we've got our language defined, what're we going to do? We'll create a function for each of our rules. But where do we start? There's no starting state. Hm...let's add a few rules.

    Code:
    <EOF>       :=  $
    <START>     :=  <STATEMENT>* <EOF>
    So our program consists of a bunch of statements followed by EOF. Perfect.

    Creating Parsing Functions
    Let's create our functions to parse a Nano program. (We're assuming that we have a function called match() that consumes the next character in the input stream; if it matches the argument then it returns, otherwise it spits out an error message and exits the program.

    Code:
    #define     TOKEN_EOF   -1
    /*other #define character classes here*/
    
    void start(ifstream& input)
    {
        while(!input.eof())
            statement();
    
        match(input, TOKEN_EOF);
    }
    
    void statement(ifstream& input)
    {
        int statement_type;
    
        /*somehow figure out if we're dealing
        with an assignment, a declaration, an
        output statement, or an input statement.
        */
        
        switch(statement_type)
        {
            case STATEMENT_TYPE_ASSIGNMENT:
                assignment();
                break;
            case STATEMENT_TYPE_DECLARATION:
                declaration();
                break;
            case STATEMENT_TYPE_INPUT:
                doinput();
                break;
            case STATEMENT_TYPE_OUTPUT:
                dooutput();
                break;
            default:
                cerr << "FATAL ERROR: Unexpected statement type." << endl;
                exit(-1);
                break;
        }
    }
    
    void doinput(ifstream& input)
    {
        vector<VARIABLE_T> variables;
    
        varlist(input, variables);
    
        /*iterate through all the variables and
        generate the code to output their values*/
    }
    If we divide our tasks up like this, all we need to do is change our functions to match the CFG we generate. Relatively simple. But we need some utilities to help us. First we need match(), as you probably noticed in start(). We're also going to need to be able to determine the type of a character, so let's make a chartype() function as well. (We're going to need a few more, just hang on for a bit.)

    match()
    We're going to have to have two overloads of this function, because we need two versions of this: one to match a specific character, and one to match a range of characters. Let's make the one to match a single character first:
    Code:
    void match(ifstream& input, int expected)
    {
        int ch = input.peek();
    
        if(ch != expected)
        {
            cerr << "ERROR: Expected \'" << (char)expected
                    << "\' but found \'" << (char)ch << "\'"
                    << endl;
    
            exit(-1);
        }
    }
    Not too difficult, right? Let's try matching a character in a set of characters:
    Code:
    void match(ifstream& input, const char *charclass)
    {
        int ch = input.peek();
    
        if(strchr(charclass, ch) == NULL)
        {
            cerr << "ERROR: Expected a character in \'"
                    << charclass << "\' but found \'"
                    << (char)ch << "\'" << endl;
    
            exit(-1);
        }
    }
    In case you don't know what strchr() does

    chartype()
    Hm...how are we going to go about this? My way would be to have chartype() return an integer (an enum, really, but I hate using enums for some reason) describing the type of character it found. We'll have a few classes of characters, namely:
    Code:
    #define     CHARTYPE_EOF            -1
    #define     CHARTYPE_UNKNOWN        0       /* everything else */
    #define     CHARTYPE_WHITESPACE     1       /* '\x00' .. '\x1F' */
    #define     CHARTYPE_DIGIT          2       /* 0..9 */
    #define     CHARTYPE_LETTER         3       /* A..Z a..z */
    #define     CHARTYPE_OPERATOR       4       /* + - * / % = */
    #define     CHARTYPE_LPAREN         5       /* ( */
    #define     CHARTYPE_RPAREN         6       /* ) */
    #define     CHARTYPE_USCORE         7       /* _ */
    This isn't too hard:
    Code:
    #include <cctype>
    #include <cstring>
    
    int chartype(int ch)
    {
        if( ')' == (char)ch )
            return CHARTYPE_RPAREN;
        else if( '(' == (char)ch )
            return CHARTYPE_LPAREN;
        else if( strchr("+-*/%=", ch) )
            return CHARTYPE_OPERATOR;
        else if( isalpha(ch) )
            return CHARTYPE_LETTER;
        else if( isdigit(ch) )
            return CHARTYPE_DIGIT;
        else if( isspace(ch) || iscntrl(ch) )
            return CHARTYPE_WHITESPACE;
        else if( (char)ch == '_' )
            return CHARTYPE_USCORE;
        else if( ch == -1 )
            return CHARTYPE_EOF;
        else
            return CHARTYPE_UNKNOWN;
    }
    Slightly annoying, but it gets the job done. We'll add character classes and cases as this series goes on and our language gets more complicated.

    All right, I think this is long enough. I'll stop here for now. Next time I'm going to show you how we're going to use match() and chartype(), and then create a few more utility functions before we plough on.

    Stay tuned!
    Last edited by dargueta; 12-22-2009 at 05:38 PM. Reason: Forgot underscore
    sudo rm -rf /

  2. CODECALL Circuit advertisement
    Join Date
    Always
    Location
    Advertising world
    Posts
    Many

     
  3. #2
    Join Date
    Jul 2006
    Posts
    16,491
    Blog Entries
    75
    Rep Power
    143

    Re: Creating A Simple Compiler: Part 2

    Ahh... the wonderful world of parsing +rep
    Programming is a branch of mathematics.
    My CodeCall Blog | My Personal Blog

  4. #3
    Join Date
    Oct 2007
    Location
    /dev/null
    Posts
    4,513
    Blog Entries
    8
    Rep Power
    59

    Re: Creating A Simple Compiler: Part 2

    Wonderful indeed. The great thing about it is that it looks complicated, but if you know how to divvy it up the right way, it's really not that bad.
    sudo rm -rf /

  5. #4
    Join Date
    Jul 2009
    Location
    Santa Clarita, CA
    Posts
    2,111
    Blog Entries
    47
    Rep Power
    31

    Re: Creating A Simple Compiler: Part 2

    Excellent! +rep!
    Wow I changed my sig!

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. Intermediate Creating A Simple Compiler: Part 6
    By dargueta in forum C Tutorials
    Replies: 10
    Last Post: 03-04-2011, 12:02 PM
  2. Creating A Simple Compiler: Part 1
    By dargueta in forum C Tutorials
    Replies: 29
    Last Post: 09-21-2010, 08:08 AM
  3. Creating a Simple Compiler: Part 5
    By dargueta in forum C Tutorials
    Replies: 1
    Last Post: 08-08-2010, 10:58 AM
  4. Creating A Simple Compiler: Part 4
    By dargueta in forum C Tutorials
    Replies: 7
    Last Post: 04-19-2010, 04:19 PM
  5. Creating A Simple Compiler: Part 3
    By dargueta in forum C Tutorials
    Replies: 4
    Last Post: 01-06-2010, 03:57 PM

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts