Hello, and welcome to part three of my tutorial series, Creating a Simple Compiler. In this part I'm going to go over a few more utility functions and show you how to use the match() and chartype() functions we made in the last tutorial.
Previous Tutorial Sections
Part 1 | Part 2
Using chartype()
As I set forth in part one of this series, identifiers in Nano are only permitted to consist of characters A-Z, a-z, digits 0-9, and underscores (_). In addition, they must begin with an underscore or a character.
Let's say that we know that the next character begins an identifier in our input stream, and we want to read in that identifier, but not anything following it. We could do something like this:
Relatively straightforward, I'd say. We can do similar functions for other types just by replacing the IS_IDCHAR macro with what we're trying to read. For example, we could use the following:Code:#define IS_IDCHAR(x) (((x)==CHARTYPE_DIGIT)||((x)==CHARTYPE_LETTER)||((x)==CHARTYPE_USCORE)) string next_identifier(ifstream& input) { string id; int ch, chtype; bool isidtype = false; id = ""; /*need to hardcode the first iteration because integers aren't allowed as the first character of an identifier, but they're perfectly legal for the rest of it*/ ch = input.getc(); chtype = chartype(ch); if((chtype == CHARTYPE_LETTER) || (chtype == CHARTYPE_USCORE)) id += (char)ch; else { input.ungetc(ch); cerr << "ERROR: Expected identifier, found \'" << ch << "\'" << endl; exit(-1); } /*note that the rest of the identifier can have digits as well.*/ while(true) { ch = input.getc(); chtype = chartype(ch); if( IS_IDCHAR(chtype) ) id += (char)ch; else { input.ungetc(ch); break; } } return id; }
Slightly redundant for this case, but it makes our code a bit more readable and we can copy-paste code an just change a few things to customize it for the task. (I also can't think of any other examples at the moment. I'll fix it when I figure one out.)Code:#define IS_NUMBER(x) ((x==CHARTYPE_DIGIT))
Using match()
We want to use this function when we're expecting something. For example, if we have a list with more than one item, we're going to expect an item, a comma, and then at least one other item. If the second item is missing, we have a syntax error.
Clearly we have a problem here. The way match() is written, it'll exit the program if it doesn't find a comma. Plus it returns void. We have to modify it to return a boolean instead of void; true if the character was found, false if it wasn't. It's not too difficult, so I'm leaving it as an exercise to the reader. I'd rather take up the space teaching you something than rewriting stuff.Code:vector<string> identifier_list(ifstream& input) { string thisid; vector<string> idlist; while(true); { /*you should be able to write this on your own. trust me, it's easy*/ skip_whitespace(); thisid = next_identifier(); if( thisid.compare("") == 0 ) { /*expected an identifier and found none.*/ cerr << "ERROR: Expected identifier, found something else." << endl; exit(-1); } else idlist.push_back(thisid); skip_whitespace(); /*if we have a comma, then we expect another identifier. if there is no comma, then we can just return our list.*/ if( !match(',') ) return idlist; } }
So now we know how to read lists, get identifiers, get character types, and expect characters. So what do we do now? Build a symbol table.
Symbol Tables
What's a symbol table?
Symbol tables are how we keep track of what variables (and later functions, structs, classes, etc.) have been declared and where. If we don't keep track of them, then we have a language that automatically declares variables when it sees them, whether you've declared it yourself or not. Visual Basic is like this unless you include the directive Option Explicit. In my opinion, this is a bad idea. If you misspell a variable name, the compiler will automatically assume that it's a different variable and declare it for you. Then you wonder why your program doesn't work. So what information do we need to include in this symbol table?
* Name
* Variable type
* File declared in
* Line declared on
For Nano, we don't really need the variable type, as I've mandated that all variables are integers. However, if we modify the language to allow other datatypes, we'll need this. Let's create a struct to hold all of this.
We can use std::map to handle all of our symbol table needs. We can use the variable name as the key, and the struct as the value. So our declaration of the map would be something like:Code:enum DATAYPE { TYPE_INTEGER }; typedef struct { /*note I'm not including the variable name in here.*/ DATATYPE type; string file; unsigned int line; }SYMTAB_ENTRY;
Now we can use the contains(), insert(), and remove() methods to perform our work. But what do we do with it?Code:/*I'm assuming you have "#include <map>" as well as "using std::map;" somewhere.*/ map <string, SYMTAB_ENTRY> symbol_table;
Declaring Variables
In our statement() function (which we have yet to finish), we expect to be at the beginning of a new statement. After we've read in the next identifier, one of the checks we need to make is to see if it's "var", which is how we declare our variables. After that we expect a list of identifiers. So...
That's enough for now. I don't know what I'm going to get to next time, so hold onto your socks until then. Hope you've enjoyed this latest installment--see you soon!Code:void declare_variables(ifstream& input, map<string,SYMTAB_ENTRY>& symtab) { vector<string> varlist; vector<string>::iterator iter; SYMTAB_ENTRY stentry; /*we're assuming that the "var" token has already been consumed and the next token is an identifier.*/ skip_whitespace(); varlist = identifier_list(input); while(iter != varlist.end()) { /*make sure the variable isn't in our table already*/ if( symtab.contains(*iter) ) { cerr << "ERROR: Duplicate declaration of \'" << *iter << "\'." << endl; exit(-1); } stentry.type = DATATYPE::TYPE_INTEGER; stentry.file = ""; /*we'll figure this out later*/ stentry.line = 0; /*again, figure it out later*/ symtab.insert( *iter, stentry ); } }
Last edited by dargueta; 12-24-2009 at 08:28 AM. Reason: Fixed minor error and added variable name in duplicate declaration
sudo rm -rf /
Another good one. +rep
awesome job this is looking really goodcant wait for the next!
+rep
Thanks, both of you!
sudo rm -rf /
thanks
There are currently 1 users browsing this thread. (0 members and 1 guests)
Bookmarks