Jump to content


Check out our Community Blogs

Register and join over 40,000 other developers!


Recent Status Updates

View All Updates

Photo
- - - - -

Creating A Simple Compiler: Part 3

variable type

  • Please log in to reply
4 replies to this topic

#1 dargueta

dargueta

    I chown trolls.

  • Moderator
  • 4854 posts
  • Programming Language:C, Java, C++, PHP, Python, JavaScript, Perl, Assembly, Bash, Others
  • Learning:Objective-C

Posted 23 December 2009 - 09:23 AM

Hello, and welcome to part three of my tutorial series, Creating a Simple Compiler. In this part I'm going to go over a few more utility functions and show you how to use the match() and chartype() functions we made in the last tutorial.

Previous Tutorial Sections
Part 1 | Part 2

Using chartype()
As I set forth in part one of this series, identifiers in Nano are only permitted to consist of characters A-Z, a-z, digits 0-9, and underscores (_). In addition, they must begin with an underscore or a character.
Let's say that we know that the next character begins an identifier in our input stream, and we want to read in that identifier, but not anything following it. We could do something like this:

#define IS_IDCHAR(x)   (((x)==CHARTYPE_DIGIT)||((x)==CHARTYPE_LETTER)||((x)==CHARTYPE_USCORE))

string next_identifier(ifstream& input)
{
    string id;
    int ch, chtype;
    bool isidtype = false;
    
    id = "";
    
    /*need to hardcode the first iteration because integers
    aren't allowed as the first character of an identifier,
    but they're perfectly legal for the rest of it*/

    ch = input.getc();
    chtype = chartype(ch);
    if((chtype == CHARTYPE_LETTER) || (chtype == CHARTYPE_USCORE))
        id += (char)ch;
    else
    {
        input.ungetc(ch);
        cerr << "ERROR: Expected identifier, found \'" << ch << "\'" << endl;
        exit(-1);
    }
    
    /*note that the rest of the identifier can have digits as well.*/
    while(true)
    {
        ch = input.getc();
        chtype = chartype(ch);
        if( IS_IDCHAR(chtype) )
            id += (char)ch;
        else
        {
            input.ungetc(ch);
            break;
        }
    }
    
    return id;
}
Relatively straightforward, I'd say. We can do similar functions for other types just by replacing the IS_IDCHAR macro with what we're trying to read. For example, we could use the following:

#define IS_NUMBER(x)    ((x==CHARTYPE_DIGIT))
Slightly redundant for this case, but it makes our code a bit more readable and we can copy-paste code an just change a few things to customize it for the task. (I also can't think of any other examples at the moment. I'll fix it when I figure one out.)

Using match()
We want to use this function when we're expecting something. For example, if we have a list with more than one item, we're going to expect an item, a comma, and then at least one other item. If the second item is missing, we have a syntax error.

vector<string> identifier_list(ifstream& input)
{
    string thisid;
    vector<string> idlist;
    
    while(true);
    {
        /*you should be able to write this
          on your own. trust me, it's easy*/
        skip_whitespace();
        thisid = next_identifier();
        
        if( thisid.compare("") == 0 )
        {
            /*expected an identifier and found none.*/
            cerr << "ERROR: Expected identifier, found something else." << endl;
            exit(-1);
        }
        else
            idlist.push_back(thisid);
        
        skip_whitespace();
        
        /*if we have a comma, then we expect another identifier.
        if there is no comma, then we can just return our list.*/
        if( !match(',') )
            return idlist;
    }
}
Clearly we have a problem here. The way match() is written, it'll exit the program if it doesn't find a comma. Plus it returns void. We have to modify it to return a boolean instead of void; true if the character was found, false if it wasn't. It's not too difficult, so I'm leaving it as an exercise to the reader. I'd rather take up the space teaching you something than rewriting stuff.
So now we know how to read lists, get identifiers, get character types, and expect characters. So what do we do now? Build a symbol table.

Symbol Tables

What's a symbol table?
Symbol tables are how we keep track of what variables (and later functions, structs, classes, etc.) have been declared and where. If we don't keep track of them, then we have a language that automatically declares variables when it sees them, whether you've declared it yourself or not. Visual Basic is like this unless you include the directive Option Explicit. In my opinion, this is a bad idea. If you misspell a variable name, the compiler will automatically assume that it's a different variable and declare it for you. Then you wonder why your program doesn't work. So what information do we need to include in this symbol table?
* Name
* Variable type
* File declared in
* Line declared on

For Nano, we don't really need the variable type, as I've mandated that all variables are integers. However, if we modify the language to allow other datatypes, we'll need this. Let's create a struct to hold all of this.

enum DATAYPE { TYPE_INTEGER };

typedef struct
{
    /*note I'm not including the
    variable name in here.*/

    DATATYPE        type;
    string          file;
    unsigned int    line;
}SYMTAB_ENTRY;

We can use std::map to handle all of our symbol table needs. We can use the variable name as the key, and the struct as the value. So our declaration of the map would be something like:

/*I'm assuming you have "#include <map>"
as well as "using std::map;" somewhere.*/

map <string, SYMTAB_ENTRY> symbol_table;
Now we can use the contains(), insert(), and remove() methods to perform our work. But what do we do with it?

Declaring Variables
In our statement() function (which we have yet to finish), we expect to be at the beginning of a new statement. After we've read in the next identifier, one of the checks we need to make is to see if it's "var", which is how we declare our variables. After that we expect a list of identifiers. So...
void declare_variables(ifstream& input, map<string,SYMTAB_ENTRY>& symtab)
{
    vector<string> varlist;
    vector<string>::iterator iter;
    SYMTAB_ENTRY stentry;

    /*we're assuming that the "var" token has already been
    consumed and the next token is an identifier.*/
    
    skip_whitespace();
    varlist = identifier_list(input);
    while(iter != varlist.end())
    {
        /*make sure the variable isn't in our table already*/
        if( symtab.contains(*iter) )
        {
            cerr << "ERROR: Duplicate declaration of \'" << *iter
            << "\'." << endl;
            exit(-1);
        }
        
        stentry.type = DATATYPE::TYPE_INTEGER;
        stentry.file = "";  /*we'll figure this out later*/
        stentry.line = 0;   /*again, figure it out later*/
        symtab.insert( *iter, stentry );
    }
}

That's enough for now. I don't know what I'm going to get to next time, so hold onto your socks until then. Hope you've enjoyed this latest installment--see you soon!

Edited by dargueta, 24 December 2009 - 08:28 AM.
Fixed minor error and added variable name in duplicate declaration

  • 0

sudo rm -rf / && echo $'Sanitize your inputs!'


#2 WingedPanther73

WingedPanther73

    A spammer's worst nightmare

  • Moderator
  • 17757 posts
  • Location:Upstate, South Carolina
  • Programming Language:C, C++, PL/SQL, Delphi/Object Pascal, Pascal, Transact-SQL, Others
  • Learning:Java, C#, PHP, JavaScript, Lisp, Fortran, Haskell, Others

Posted 23 December 2009 - 10:06 AM

Another good one. +rep
  • 0

Programming is a branch of mathematics.
My CodeCall Blog | My Personal Blog

My MineCraft server site: http://banishedwings.enjin.com/


#3 MicahN

MicahN

    CC Lurker

  • Just Joined
  • Pip
  • 2 posts

Posted 23 December 2009 - 10:32 AM

awesome job this is looking really good :) cant wait for the next!

+rep
  • 0

#4 dargueta

dargueta

    I chown trolls.

  • Moderator
  • 4854 posts
  • Programming Language:C, Java, C++, PHP, Python, JavaScript, Perl, Assembly, Bash, Others
  • Learning:Objective-C

Posted 24 December 2009 - 08:28 AM

Thanks, both of you!
  • 0

sudo rm -rf / && echo $'Sanitize your inputs!'


#5 bekace

bekace

    CC Newcomer

  • Just Joined
  • PipPip
  • 12 posts

Posted 06 January 2010 - 03:57 PM

thanks
  • 0





Also tagged with one or more of these keywords: variable type

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download