Previous Tutorial Sections
Part 1 | Part 2
Using chartype()
As I set forth in part one of this series, identifiers in Nano are only permitted to consist of characters A-Z, a-z, digits 0-9, and underscores (_). In addition, they must begin with an underscore or a character.
Let's say that we know that the next character begins an identifier in our input stream, and we want to read in that identifier, but not anything following it. We could do something like this:
#define IS_IDCHAR(x) (((x)==CHARTYPE_DIGIT)||((x)==CHARTYPE_LETTER)||((x)==CHARTYPE_USCORE)) string next_identifier(ifstream& input) { string id; int ch, chtype; bool isidtype = false; id = ""; /*need to hardcode the first iteration because integers aren't allowed as the first character of an identifier, but they're perfectly legal for the rest of it*/ ch = input.getc(); chtype = chartype(ch); if((chtype == CHARTYPE_LETTER) || (chtype == CHARTYPE_USCORE)) id += (char)ch; else { input.ungetc(ch); cerr << "ERROR: Expected identifier, found \'" << ch << "\'" << endl; exit(-1); } /*note that the rest of the identifier can have digits as well.*/ while(true) { ch = input.getc(); chtype = chartype(ch); if( IS_IDCHAR(chtype) ) id += (char)ch; else { input.ungetc(ch); break; } } return id; }Relatively straightforward, I'd say. We can do similar functions for other types just by replacing the IS_IDCHAR macro with what we're trying to read. For example, we could use the following:
#define IS_NUMBER(x) ((x==CHARTYPE_DIGIT))Slightly redundant for this case, but it makes our code a bit more readable and we can copy-paste code an just change a few things to customize it for the task. (I also can't think of any other examples at the moment. I'll fix it when I figure one out.)
Using match()
We want to use this function when we're expecting something. For example, if we have a list with more than one item, we're going to expect an item, a comma, and then at least one other item. If the second item is missing, we have a syntax error.
vector<string> identifier_list(ifstream& input) { string thisid; vector<string> idlist; while(true); { /*you should be able to write this on your own. trust me, it's easy*/ skip_whitespace(); thisid = next_identifier(); if( thisid.compare("") == 0 ) { /*expected an identifier and found none.*/ cerr << "ERROR: Expected identifier, found something else." << endl; exit(-1); } else idlist.push_back(thisid); skip_whitespace(); /*if we have a comma, then we expect another identifier. if there is no comma, then we can just return our list.*/ if( !match(',') ) return idlist; } }Clearly we have a problem here. The way match() is written, it'll exit the program if it doesn't find a comma. Plus it returns void. We have to modify it to return a boolean instead of void; true if the character was found, false if it wasn't. It's not too difficult, so I'm leaving it as an exercise to the reader. I'd rather take up the space teaching you something than rewriting stuff.
So now we know how to read lists, get identifiers, get character types, and expect characters. So what do we do now? Build a symbol table.
Symbol Tables
What's a symbol table?
Symbol tables are how we keep track of what variables (and later functions, structs, classes, etc.) have been declared and where. If we don't keep track of them, then we have a language that automatically declares variables when it sees them, whether you've declared it yourself or not. Visual Basic is like this unless you include the directive Option Explicit. In my opinion, this is a bad idea. If you misspell a variable name, the compiler will automatically assume that it's a different variable and declare it for you. Then you wonder why your program doesn't work. So what information do we need to include in this symbol table?
* Name
* Variable type
* File declared in
* Line declared on
For Nano, we don't really need the variable type, as I've mandated that all variables are integers. However, if we modify the language to allow other datatypes, we'll need this. Let's create a struct to hold all of this.
enum DATAYPE { TYPE_INTEGER }; typedef struct { /*note I'm not including the variable name in here.*/ DATATYPE type; string file; unsigned int line; }SYMTAB_ENTRY;
We can use std::map to handle all of our symbol table needs. We can use the variable name as the key, and the struct as the value. So our declaration of the map would be something like:
/*I'm assuming you have "#include <map>" as well as "using std::map;" somewhere.*/ map <string, SYMTAB_ENTRY> symbol_table;Now we can use the contains(), insert(), and remove() methods to perform our work. But what do we do with it?
Declaring Variables
In our statement() function (which we have yet to finish), we expect to be at the beginning of a new statement. After we've read in the next identifier, one of the checks we need to make is to see if it's "var", which is how we declare our variables. After that we expect a list of identifiers. So...
void declare_variables(ifstream& input, map<string,SYMTAB_ENTRY>& symtab) { vector<string> varlist; vector<string>::iterator iter; SYMTAB_ENTRY stentry; /*we're assuming that the "var" token has already been consumed and the next token is an identifier.*/ skip_whitespace(); varlist = identifier_list(input); while(iter != varlist.end()) { /*make sure the variable isn't in our table already*/ if( symtab.contains(*iter) ) { cerr << "ERROR: Duplicate declaration of \'" << *iter << "\'." << endl; exit(-1); } stentry.type = DATATYPE::TYPE_INTEGER; stentry.file = ""; /*we'll figure this out later*/ stentry.line = 0; /*again, figure it out later*/ symtab.insert( *iter, stentry ); } }
That's enough for now. I don't know what I'm going to get to next time, so hold onto your socks until then. Hope you've enjoyed this latest installment--see you soon!
Edited by dargueta, 24 December 2009 - 08:28 AM.
Fixed minor error and added variable name in duplicate declaration