+ Reply to Thread
Results 1 to 5 of 5

Thread: Creating A Simple Compiler: Part 3

  1. #1
    Join Date
    Oct 2007
    Location
    /dev/null
    Posts
    4,513
    Blog Entries
    8
    Rep Power
    59

    Creating A Simple Compiler: Part 3

    Hello, and welcome to part three of my tutorial series, Creating a Simple Compiler. In this part I'm going to go over a few more utility functions and show you how to use the match() and chartype() functions we made in the last tutorial.

    Previous Tutorial Sections
    Part 1 | Part 2

    Using chartype()
    As I set forth in part one of this series, identifiers in Nano are only permitted to consist of characters A-Z, a-z, digits 0-9, and underscores (_). In addition, they must begin with an underscore or a character.
    Let's say that we know that the next character begins an identifier in our input stream, and we want to read in that identifier, but not anything following it. We could do something like this:

    Code:
    #define IS_IDCHAR(x)   (((x)==CHARTYPE_DIGIT)||((x)==CHARTYPE_LETTER)||((x)==CHARTYPE_USCORE))
    
    string next_identifier(ifstream& input)
    {
        string id;
        int ch, chtype;
        bool isidtype = false;
        
        id = "";
        
        /*need to hardcode the first iteration because integers
        aren't allowed as the first character of an identifier,
        but they're perfectly legal for the rest of it*/
    
        ch = input.getc();
        chtype = chartype(ch);
        if((chtype == CHARTYPE_LETTER) || (chtype == CHARTYPE_USCORE))
            id += (char)ch;
        else
        {
            input.ungetc(ch);
            cerr << "ERROR: Expected identifier, found \'" << ch << "\'" << endl;
            exit(-1);
        }
        
        /*note that the rest of the identifier can have digits as well.*/
        while(true)
        {
            ch = input.getc();
            chtype = chartype(ch);
            if( IS_IDCHAR(chtype) )
                id += (char)ch;
            else
            {
                input.ungetc(ch);
                break;
            }
        }
        
        return id;
    }
    Relatively straightforward, I'd say. We can do similar functions for other types just by replacing the IS_IDCHAR macro with what we're trying to read. For example, we could use the following:

    Code:
    #define IS_NUMBER(x)    ((x==CHARTYPE_DIGIT))
    Slightly redundant for this case, but it makes our code a bit more readable and we can copy-paste code an just change a few things to customize it for the task. (I also can't think of any other examples at the moment. I'll fix it when I figure one out.)

    Using match()
    We want to use this function when we're expecting something. For example, if we have a list with more than one item, we're going to expect an item, a comma, and then at least one other item. If the second item is missing, we have a syntax error.

    Code:
    vector<string> identifier_list(ifstream& input)
    {
        string thisid;
        vector<string> idlist;
        
        while(true);
        {
            /*you should be able to write this
              on your own. trust me, it's easy*/
            skip_whitespace();
            thisid = next_identifier();
            
            if( thisid.compare("") == 0 )
            {
                /*expected an identifier and found none.*/
                cerr << "ERROR: Expected identifier, found something else." << endl;
                exit(-1);
            }
            else
                idlist.push_back(thisid);
            
            skip_whitespace();
            
            /*if we have a comma, then we expect another identifier.
            if there is no comma, then we can just return our list.*/
            if( !match(',') )
                return idlist;
        }
    }
    Clearly we have a problem here. The way match() is written, it'll exit the program if it doesn't find a comma. Plus it returns void. We have to modify it to return a boolean instead of void; true if the character was found, false if it wasn't. It's not too difficult, so I'm leaving it as an exercise to the reader. I'd rather take up the space teaching you something than rewriting stuff.
    So now we know how to read lists, get identifiers, get character types, and expect characters. So what do we do now? Build a symbol table.

    Symbol Tables

    What's a symbol table?
    Symbol tables are how we keep track of what variables (and later functions, structs, classes, etc.) have been declared and where. If we don't keep track of them, then we have a language that automatically declares variables when it sees them, whether you've declared it yourself or not. Visual Basic is like this unless you include the directive Option Explicit. In my opinion, this is a bad idea. If you misspell a variable name, the compiler will automatically assume that it's a different variable and declare it for you. Then you wonder why your program doesn't work. So what information do we need to include in this symbol table?
    * Name
    * Variable type
    * File declared in
    * Line declared on

    For Nano, we don't really need the variable type, as I've mandated that all variables are integers. However, if we modify the language to allow other datatypes, we'll need this. Let's create a struct to hold all of this.

    Code:
    enum DATAYPE { TYPE_INTEGER };
    
    typedef struct
    {
        /*note I'm not including the
        variable name in here.*/
    
        DATATYPE        type;
        string          file;
        unsigned int    line;
    }SYMTAB_ENTRY;
    We can use std::map to handle all of our symbol table needs. We can use the variable name as the key, and the struct as the value. So our declaration of the map would be something like:

    Code:
    /*I'm assuming you have "#include <map>"
    as well as "using std::map;" somewhere.*/
    
    map <string, SYMTAB_ENTRY> symbol_table;
    Now we can use the contains(), insert(), and remove() methods to perform our work. But what do we do with it?

    Declaring Variables
    In our statement() function (which we have yet to finish), we expect to be at the beginning of a new statement. After we've read in the next identifier, one of the checks we need to make is to see if it's "var", which is how we declare our variables. After that we expect a list of identifiers. So...
    Code:
    void declare_variables(ifstream& input, map<string,SYMTAB_ENTRY>& symtab)
    {
        vector<string> varlist;
        vector<string>::iterator iter;
        SYMTAB_ENTRY stentry;
    
        /*we're assuming that the "var" token has already been
        consumed and the next token is an identifier.*/
        
        skip_whitespace();
        varlist = identifier_list(input);
        while(iter != varlist.end())
        {
            /*make sure the variable isn't in our table already*/
            if( symtab.contains(*iter) )
            {
                cerr << "ERROR: Duplicate declaration of \'" << *iter
                << "\'." << endl;
                exit(-1);
            }
            
            stentry.type = DATATYPE::TYPE_INTEGER;
            stentry.file = "";  /*we'll figure this out later*/
            stentry.line = 0;   /*again, figure it out later*/
            symtab.insert( *iter, stentry );
        }
    }
    That's enough for now. I don't know what I'm going to get to next time, so hold onto your socks until then. Hope you've enjoyed this latest installment--see you soon!
    Last edited by dargueta; 12-24-2009 at 08:28 AM. Reason: Fixed minor error and added variable name in duplicate declaration
    sudo rm -rf /

  2. CODECALL Circuit advertisement
    Join Date
    Always
    Location
    Advertising world
    Posts
    Many

     
  3. #2
    Join Date
    Jul 2006
    Posts
    16,491
    Blog Entries
    75
    Rep Power
    143

    Re: Creating A Simple Compiler: Part 3

    Another good one. +rep
    Programming is a branch of mathematics.
    My CodeCall Blog | My Personal Blog

  4. #3
    MicahN is offline Newbie
    Join Date
    Dec 2009
    Posts
    2
    Rep Power
    0

    Re: Creating A Simple Compiler: Part 3

    awesome job this is looking really good cant wait for the next!

    +rep

  5. #4
    Join Date
    Oct 2007
    Location
    /dev/null
    Posts
    4,513
    Blog Entries
    8
    Rep Power
    59

    Re: Creating A Simple Compiler: Part 3

    Thanks, both of you!
    sudo rm -rf /

  6. #5
    bekace is offline Newbie
    Join Date
    Jan 2010
    Posts
    13
    Rep Power
    0

    Re: Creating A Simple Compiler: Part 3

    thanks

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. Intermediate Creating A Simple Compiler: Part 6
    By dargueta in forum C Tutorials
    Replies: 10
    Last Post: 03-04-2011, 12:02 PM
  2. Creating A Simple Compiler: Part 1
    By dargueta in forum C Tutorials
    Replies: 29
    Last Post: 09-21-2010, 08:08 AM
  3. Creating a Simple Compiler: Part 5
    By dargueta in forum C Tutorials
    Replies: 1
    Last Post: 08-08-2010, 10:58 AM
  4. Creating A Simple Compiler: Part 4
    By dargueta in forum C Tutorials
    Replies: 7
    Last Post: 04-19-2010, 04:19 PM
  5. Creating A Simple Compiler: Part 2
    By dargueta in forum C Tutorials
    Replies: 3
    Last Post: 01-08-2010, 12:25 AM

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts