+ Reply to Thread
Page 1 of 3 123 LastLast
Results 1 to 10 of 30

Thread: Creating A Simple Compiler: Part 1

  1. #1
    Join Date
    Oct 2007
    Location
    /dev/null
    Posts
    4,513
    Blog Entries
    8
    Rep Power
    59

    Creating A Simple Compiler: Part 1

    Do you want to write your own compiler, but don't know how? Let me help you! I'm not exactly a master at this, but I'd like to share my meager knowledge with you.

    For this tutorial series, I'm going to assume you're capable of doing quite a bit more than a Hello World program. I'll explain stuff as needed, but I'm going to assume you know how to use printf, scanf, iostream and all that fun stuff. Oh, and as a warning, I have an incredibly bad habit of mixing C with C++. I'm going to try to keep to C++, but if I slip up, just slap me and I'll fix it.

    For this tutorial series we're going to write a very simple compiler for a language I shall call Nano, because it's small. Our compiler will generate a simple assembly language program from our Nano code. Neither Nano nor the assembly language output is meant to be terribly useful, but you'll be able to apply the concepts you learn to make a better compiler that is useful.

    Let's get started!
    First we have to define the syntax for our language. Syntax is the set of rules that define what you can do and what you can't. If you don't have rules, then writing a compiler is going to be a real pain in the butt, if not impossible. So we need to have rules.

    1) All statements will end in a semicolon.
    2) Whitespace is not significant, i.e. a+b is the same as a + b.
    3) Comments begin with /* and end with */ and they can extend across multiple lines.
    4) The only keywords are var, input, and output. Their function is as follows:
    a) var declares a new variable. All variables must be declared before they're used.
    b) input reads a variable from the console.
    c) output displays the value of a variable on the console.
    5) All variables are integers. (This will make our life easier.)
    6) The only operations allowed are +, -, *, /, %, and =. Parentheses may be used to force operation ordering.
    7) Variable names may only consist of uppercase and lowercase letters, underscores (_), and numbers. A variable name may not start with a number.

    (As you can see, this is a very restricted language. As we progress I'm going to amend the syntax to add if-else constructs and all that fun stuff you're used to.)

    Just so you can see an example of what a Nano program might look like:
    Code:
    var a, b, c;
    /*this is a comment*/
    
    a = 5;
    b = 6;
    c = (a * b)  + (a / b);
    We need some way of expressing the rules of our syntax in a more systematic format that'll make writing our compiler easier. Allow me to introduce you to context-free grammar. A context-free grammar is a set of production rules, or a list of tokens and their possible evaluations. There are two parts to every production rule: the variable, and its resolutions. Variables can resolve to other variables, or they can resolve to a single terminal--a token that has no further resolutions. Let me show you what I mean. Here's a very simple grammar:

    Code:
    <START> ::= "The"  <PERSON>  <VERB>  "the man"
    <PERSON> ::= "dog "  |  "cat "
    <VERB> ::= "ate "  |  "bit "
    The goal of a context-free grammar is to produce a set of terminals. We always begin with START. Now we have "The", which is a terminal, a dead end. But that's not the end of the rule. We have a variable--two actually--that we need to resolve. So we go on to PERSON.
    Here we have a choice between "dog " and "cat ." I like dogs, so I'm going to pick "dog." Now we have "The dog " VERB "the man." We need to resolve VERB. So we look up VERB, and find we have two possibilities: "ate " and "bit ." Just for kicks I'll pick "ate ." So now we have "The dog ate the man." No variables left, so we have a valid token!
    Other possibilities include "The cat ate the man," "The dog bit the man," and so on.

    So how do we apply this to expressing the syntax of a language? Let's start with something simple. Why don't we define what an integer can be? First, what is an integer? It's a series of digits, from 0 to 9 inclusive. (We're only going to deal with decimal for now.) So why don't we make a CFG rule for a number:

    Code:
    <DIGIT> ::= 0 .. 9
    <NUMBER> ::= <DIGIT>+
    Note: The trailing + means "one or more." We'll use a * to indicate "zero or more," and ? to indicate "zero or one." To indicate a range we will use .., so a .. c expands to a b c. I'm lazy.

    Simple enough, right? Let's create a few more:
    Code:
    <LETTER> ::= 'A' .. 'Z' | 'a' .. 'z'
    <IDCHAR> := <LETTER> | '_'
    <VARIABLE> ::= <IDCHAR>+  (<DIGIT> | <IDCHAR>)*
    The last rule is a little complicated. The first part says we must have one or more IDCHARs at the beginning of an identifier, followed by any combination of IDCHARs and digits. So a few example resolutions would be foo, __sprintf, a_b_1578249367890, and so on.

    Why don't we try something harder? How about addition?
    Code:
    <ADDITION> ::= (<VARIABLE> '+' <NUMBER>) | \
                    (<NUMBER> '+' <NUMBER>) | \
                    (<VARIABLE> '+' <VARIABLE>) | \
                    (<NUMBER> '+' <VARIABLE>)
    Do you see how everything builds on the basic definitions of a variable and digit? We can extend this to do variable declarations, all possible arithmetic operations, and more.

    Well, I think that's enough for today. I'll present you with a full CFG for Nano next time, and then we'll begin with constructing basic parsing tools. Hope you enjoyed my tutorial!
    Last edited by dargueta; 12-17-2009 at 08:19 PM. Reason: Switched to BNF instead of using my own retarded format
    sudo rm -rf /

  2. CODECALL Circuit advertisement
    Join Date
    Always
    Location
    Advertising world
    Posts
    Many

     
  3. #2
    Join Date
    Jul 2006
    Posts
    16,478
    Blog Entries
    75
    Rep Power
    143

    Re: Creating A Simple Compiler: Part 1

    Nice intro to how to define a language. +rep
    Programming is a branch of mathematics.
    My CodeCall Blog | My Personal Blog

  4. #3
    Join Date
    Oct 2007
    Location
    /dev/null
    Posts
    4,513
    Blog Entries
    8
    Rep Power
    59

    Re: Creating A Simple Compiler: Part 1

    Wow, that was fast. Thanks!
    sudo rm -rf /

  5. #4
    Join Date
    Jul 2009
    Location
    Santa Clarita, CA
    Posts
    2,111
    Blog Entries
    47
    Rep Power
    31

    Re: Creating A Simple Compiler: Part 1

    Awesome job, I'm going to really like this series!

    ++rep
    Wow I changed my sig!

  6. #5
    Join Date
    Jul 2006
    Posts
    16,478
    Blog Entries
    75
    Rep Power
    143

    Re: Creating A Simple Compiler: Part 1

    Hovering while I kill our spam-deluge will do that
    Programming is a branch of mathematics.
    My CodeCall Blog | My Personal Blog

  7. #6
    Join Date
    Oct 2007
    Location
    /dev/null
    Posts
    4,513
    Blog Entries
    8
    Rep Power
    59

    Re: Creating A Simple Compiler: Part 1

    Ah, yes. There should be some autoreport thing that deletes posts containing words like viagra and such if the post count is below a certain number (say 100?). That way I can deride my fellow CodeCallers in the lounge without my posts getting deleted.
    sudo rm -rf /

  8. #7
    Join Date
    Nov 2009
    Location
    London
    Posts
    866
    Blog Entries
    3
    Rep Power
    14

    Re: Creating A Simple Compiler: Part 1

    Awesome tutorial, will def find this series very useful. +rep

    I'll work on the autoreport feature too, sounds like a good idea.

  9. #8
    Join Date
    Sep 2009
    Location
    USA
    Posts
    3,400
    Blog Entries
    5
    Rep Power
    37

    Re: Creating A Simple Compiler: Part 1



    This is so cool
    Root Beer == System Administrator's Beer
    Download the new operating system programming kit! (some assembly required)

  10. #9
    Join Date
    Oct 2007
    Location
    /dev/null
    Posts
    4,513
    Blog Entries
    8
    Rep Power
    59

    Re: Creating A Simple Compiler: Part 1

    Aww...I feel special. Thanks!
    sudo rm -rf /

  11. #10
    Join Date
    Jul 2006
    Location
    Amherst, New York, United States
    Posts
    6,277
    Blog Entries
    26
    Rep Power
    20

    Re: Creating A Simple Compiler: Part 1

    Very nice. My only recommendation would be to use BNF/EBNF to represent your grammar. I think it would be easier to read.

+ Reply to Thread
Page 1 of 3 123 LastLast

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. Intermediate Creating A Simple Compiler: Part 6
    By dargueta in forum C Tutorials
    Replies: 10
    Last Post: 03-04-2011, 12:02 PM
  2. Creating a Simple Compiler: Part 5
    By dargueta in forum C Tutorials
    Replies: 1
    Last Post: 08-08-2010, 10:58 AM
  3. Creating A Simple Compiler: Part 4
    By dargueta in forum C Tutorials
    Replies: 7
    Last Post: 04-19-2010, 04:19 PM
  4. Creating A Simple Compiler: Part 2
    By dargueta in forum C Tutorials
    Replies: 3
    Last Post: 01-08-2010, 12:25 AM
  5. Creating A Simple Compiler: Part 3
    By dargueta in forum C Tutorials
    Replies: 4
    Last Post: 01-06-2010, 03:57 PM

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts