+ Reply to Thread
Page 1 of 2
1 2 LastLast
Results 1 to 10 of 16

Thread: Creating A Simple Compiler: Part 1

  1. #1
    Code Warrior dargueta has much to be proud of dargueta has much to be proud of dargueta has much to be proud of dargueta has much to be proud of dargueta has much to be proud of dargueta has much to be proud of dargueta has much to be proud of dargueta has much to be proud of dargueta's Avatar
    Join Date
    Oct 2007
    Age
    19
    Posts
    2,816
    Blog Entries
    8

    Creating A Simple Compiler: Part 1

    Do you want to write your own compiler, but don't know how? Let me help you! I'm not exactly a master at this, but I'd like to share my meager knowledge with you.

    For this tutorial series, I'm going to assume you're capable of doing quite a bit more than a Hello World program. I'll explain stuff as needed, but I'm going to assume you know how to use printf, scanf, iostream and all that fun stuff. Oh, and as a warning, I have an incredibly bad habit of mixing C with C++. I'm going to try to keep to C++, but if I slip up, just slap me and I'll fix it.

    For this tutorial series we're going to write a very simple compiler for a language I shall call Nano, because it's small. Our compiler will generate a simple assembly language program from our Nano code. Neither Nano nor the assembly language output is meant to be terribly useful, but you'll be able to apply the concepts you learn to make a better compiler that is useful.

    Let's get started!
    First we have to define the syntax for our language. Syntax is the set of rules that define what you can do and what you can't. If you don't have rules, then writing a compiler is going to be a real pain in the butt, if not impossible. So we need to have rules.

    1) All statements will end in a semicolon.
    2) Whitespace is not significant, i.e. a+b is the same as a + b.
    3) Comments begin with /* and end with */ and they can extend across multiple lines.
    4) The only keywords are var, input, and output. Their function is as follows:
    a) var declares a new variable. All variables must be declared before they're used.
    b) input reads a variable from the console.
    c) output displays the value of a variable on the console.
    5) All variables are integers. (This will make our life easier.)
    6) The only operations allowed are +, -, *, /, %, and =. Parentheses may be used to force operation ordering.
    7) Variable names may only consist of uppercase and lowercase letters, underscores (_), and numbers. A variable name may not start with a number.

    (As you can see, this is a very restricted language. As we progress I'm going to amend the syntax to add if-else constructs and all that fun stuff you're used to.)

    Just so you can see an example of what a Nano program might look like:
    Code:
    var a, b, c;
    /*this is a comment*/
    
    a = 5;
    b = 6;
    c = (a * b)  + (a / b);
    We need some way of expressing the rules of our syntax in a more systematic format that'll make writing our compiler easier. Allow me to introduce you to context-free grammar. A context-free grammar is a set of production rules, or a list of tokens and their possible evaluations. There are two parts to every production rule: the variable, and its resolutions. Variables can resolve to other variables, or they can resolve to a single terminal--a token that has no further resolutions. Let me show you what I mean. Here's a very simple grammar:

    Code:
    <START> ::= "The"  <PERSON>  <VERB>  "the man"
    <PERSON> ::= "dog "  |  "cat "
    <VERB> ::= "ate "  |  "bit "
    The goal of a context-free grammar is to produce a set of terminals. We always begin with START. Now we have "The", which is a terminal, a dead end. But that's not the end of the rule. We have a variable--two actually--that we need to resolve. So we go on to PERSON.
    Here we have a choice between "dog " and "cat ." I like dogs, so I'm going to pick "dog." Now we have "The dog " VERB "the man." We need to resolve VERB. So we look up VERB, and find we have two possibilities: "ate " and "bit ." Just for kicks I'll pick "ate ." So now we have "The dog ate the man." No variables left, so we have a valid token!
    Other possibilities include "The cat ate the man," "The dog bit the man," and so on.

    So how do we apply this to expressing the syntax of a language? Let's start with something simple. Why don't we define what an integer can be? First, what is an integer? It's a series of digits, from 0 to 9 inclusive. (We're only going to deal with decimal for now.) So why don't we make a CFG rule for a number:

    Code:
    <DIGIT> ::= 0 .. 9
    <NUMBER> ::= <DIGIT>+
    Note: The trailing + means "one or more." We'll use a * to indicate "zero or more," and ? to indicate "zero or one." To indicate a range we will use .., so a .. c expands to a b c. I'm lazy.

    Simple enough, right? Let's create a few more:
    Code:
    <LETTER> ::= 'A' .. 'Z' | 'a' .. 'z'
    <IDCHAR> := <LETTER> | '_'
    <VARIABLE> ::= <IDCHAR>+  (<DIGIT> | <IDCHAR>)*
    The last rule is a little complicated. The first part says we must have one or more IDCHARs at the beginning of an identifier, followed by any combination of IDCHARs and digits. So a few example resolutions would be foo, __sprintf, a_b_1578249367890, and so on.

    Why don't we try something harder? How about addition?
    Code:
    <ADDITION> ::= (<VARIABLE> '+' <NUMBER>) | \
                    (<NUMBER> '+' <NUMBER>) | \
                    (<VARIABLE> '+' <VARIABLE>) | \
                    (<NUMBER> '+' <VARIABLE>)
    Do you see how everything builds on the basic definitions of a variable and digit? We can extend this to do variable declarations, all possible arithmetic operations, and more.

    Well, I think that's enough for today. I'll present you with a full CFG for Nano next time, and then we'll begin with constructing basic parsing tools. Hope you enjoyed my tutorial!
    Last edited by dargueta; 12-17-2009 at 10:19 PM. Reason: Switched to BNF instead of using my own retarded format
    dargueta@dargueta-laptop:~$ sudo rm -rf /media/windows-partition

  2. #2
    Super Moderator WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther's Avatar
    Join Date
    Jul 2006
    Age
    36
    Posts
    11,640
    Blog Entries
    57

    Re: Creating A Simple Compiler: Part 1

    Nice intro to how to define a language. +rep
    CodeCall Blog | CodeCall Wiki | Shareware
    Programming is a branch of mathematics.
    My CodeCall Blog | My Personal Blog

  3. #3
    Code Warrior dargueta has much to be proud of dargueta has much to be proud of dargueta has much to be proud of dargueta has much to be proud of dargueta has much to be proud of dargueta has much to be proud of dargueta has much to be proud of dargueta has much to be proud of dargueta's Avatar
    Join Date
    Oct 2007
    Age
    19
    Posts
    2,816
    Blog Entries
    8

    Re: Creating A Simple Compiler: Part 1

    Wow, that was fast. Thanks!
    dargueta@dargueta-laptop:~$ sudo rm -rf /media/windows-partition

  4. #4
    Moderator ZekeDragon is a name known to all ZekeDragon is a name known to all ZekeDragon is a name known to all ZekeDragon is a name known to all ZekeDragon is a name known to all ZekeDragon is a name known to all ZekeDragon's Avatar
    Join Date
    Jul 2009
    Location
    Nowhere, Washington
    Posts
    1,755
    Blog Entries
    39

    Re: Creating A Simple Compiler: Part 1

    Awesome job, I'm going to really like this series!

    ++rep
    Should I get a userbar here?

  5. #5
    Super Moderator WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther's Avatar
    Join Date
    Jul 2006
    Age
    36
    Posts
    11,640
    Blog Entries
    57

    Re: Creating A Simple Compiler: Part 1

    Hovering while I kill our spam-deluge will do that
    CodeCall Blog | CodeCall Wiki | Shareware
    Programming is a branch of mathematics.
    My CodeCall Blog | My Personal Blog

  6. #6
    Code Warrior dargueta has much to be proud of dargueta has much to be proud of dargueta has much to be proud of dargueta has much to be proud of dargueta has much to be proud of dargueta has much to be proud of dargueta has much to be proud of dargueta has much to be proud of dargueta's Avatar
    Join Date
    Oct 2007
    Age
    19
    Posts
    2,816
    Blog Entries
    8

    Re: Creating A Simple Compiler: Part 1

    Ah, yes. There should be some autoreport thing that deletes posts containing words like viagra and such if the post count is below a certain number (say 100?). That way I can deride my fellow CodeCallers in the lounge without my posts getting deleted.
    dargueta@dargueta-laptop:~$ sudo rm -rf /media/windows-partition

  7. #7
    Administrator James.H is on a distinguished road James.H's Avatar
    Join Date
    Nov 2009
    Location
    London
    Posts
    657
    Blog Entries
    1

    Re: Creating A Simple Compiler: Part 1

    Awesome tutorial, will def find this series very useful. +rep

    I'll work on the autoreport feature too, sounds like a good idea.

  8. #8
    Code Warrior Guest is a name known to all Guest is a name known to all Guest is a name known to all Guest is a name known to all Guest is a name known to all Guest is a name known to all Guest's Avatar
    Join Date
    Sep 2009
    Location
    United States
    Age
    16
    Posts
    2,589
    Blog Entries
    2

    Re: Creating A Simple Compiler: Part 1



    This is so cool
    Vista users shouldn't have sigbars.
    < Holy cow! Two images!

  9. #9
    Code Warrior dargueta has much to be proud of dargueta has much to be proud of dargueta has much to be proud of dargueta has much to be proud of dargueta has much to be proud of dargueta has much to be proud of dargueta has much to be proud of dargueta has much to be proud of dargueta's Avatar
    Join Date
    Oct 2007
    Age
    19
    Posts
    2,816
    Blog Entries
    8

    Re: Creating A Simple Compiler: Part 1

    Aww...I feel special. Thanks!
    dargueta@dargueta-laptop:~$ sudo rm -rf /media/windows-partition

  10. #10
    Co-Administrator John is a glorious beacon of light John is a glorious beacon of light John is a glorious beacon of light John is a glorious beacon of light John is a glorious beacon of light John's Avatar
    Join Date
    Jul 2006
    Age
    21
    Posts
    5,882
    Blog Entries
    25

    Re: Creating A Simple Compiler: Part 1

    Very nice. My only recommendation would be to use BNF/EBNF to represent your grammar. I think it would be easier to read.

+ Reply to Thread
Page 1 of 2
1 2 LastLast

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

     

Similar Threads

  1. Smarty - Part 3 - Creating tables and lists
    By Orjan in forum PHP Tutorials
    Replies: 2
    Last Post: 10-18-2009, 05:46 PM
  2. Detect Character Set
    By dargueta in forum General Programming
    Replies: 14
    Last Post: 09-10-2009, 04:55 PM
  3. Replies: 10
    Last Post: 08-28-2009, 09:35 PM
  4. Java: Random Name Generator
    By Sinipull in forum Classes and Code Snippets
    Replies: 0
    Last Post: 08-05-2009, 07:43 AM
  5. Replies: 6
    Last Post: 07-20-2009, 05:27 PM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts