Hello guys, glad to be here, nice forum
I want to write a simple language compiler for learning purposes. Here is the grammar :
S -> expr | int | float
expr -> <int> + <int> | <float> + <float>
Now that I know the grammar for my language can someone guide me what next? I know there a lot of books there like flex and yacc, compilers books, but i think i will learn a lot more if i break the ice with something simple first, these books confuse, so please help this guy
By the way I am experience programmer, but in other areas, i am familiar with grammar, finite state machines, assembly language, as high level, and more so i can understand if you guide me a litle.
Thank you
Are you familiar with the concept of tokenizing a string?
Yep, familiar with all the stages in the compile process, like lexical,syntactic and semantic analasys, but theoretically. I read this big book on compilers
Compilers principles, techiques and tools
I am to page 200 or something, will continue, but i gota implement that theory in a simple exercise like this first.
but kinda having dificulty starting writing one...
Thank you for helping![]()
First thing I would do is expand your grammar to make it easier:
S -> expr | int | float
expr -> <int> + <int> | <float> + <float> | <int> + <float> | <float> + <int>
int -> digit<int> | digit
float -> <int>.<int>
digit -> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Yep, i thought that is implied.
So what next to do?
Thank you so much for helping out
Next step would be to correctly identify digit tokens, then int tokens, etc.
Can you please respond in one thread what to do, instead of like this fragmented?
Yep, i know lot of stuff but still can't connect the dots
I mean now that i have grammar do i create abstract data structures, types, keywords? Do i translate the code of my language to assembly? To machine? Maybe to another high level language ?
Identify tokens? Yes but how? Using the match command to recognize, maybe some switch statements here and there? Can you write here a short code example to explain ?
thanks a lot for helping
Realizing that I have not actually written a compiler, though I have helped a friend in a class where he wrote a compiler, I suspect that your grammar is not complete.
Right now, all you support is addition. That really isn't a programming language. I would suggest you start by thinking about something like a Pascal to C "compiler". If you have access to Bjarne Stroustrup's "The C++ Programming Language", you'll find the complete grammar for C++ takes several PAGES to list. Do you have a source language in mind, or are you trying to define it?
My sense is that you're getting ahead of yourself in your eagerness to have some code. The last compiler I helped with had a tokenizer, a tree to store tokens, and several other features. If you are trying to write code at this point, you have just leapfrogged past all the required planning and will get nowhere.
I would start with the following:
1) What is your source language?
2) What are some common statements in your source language?
3) What are some common complicated statements in your source language?
4) What is your definition for your source language?
5) Does your definition account for all the statements you've listed?
6) Can you think of any statements you may have missed?
7) Can you use what you have so far to create syntax highlighting rules in gVIM? (This will help you make sure you've properly identified literals, keywords, blocks, etc.)
At this point, you probably have a valid language definition. Then you can continue with:
8) What is your target language?
9) Does your target language have similar language constructs to your source language?
10) Does your target language have any language constructs that are different from your source language?
11) Is your target language missing any language constructs from your source language?
12) Does your target language have different limitations from your source language? (Java to C++ would have issues with interfaces, precise limits on built-in data types, etc. C++ to Java would have issues with multiple inheritance, operator overloading, etc.)
This will help you be sure you know what kinds of challenges you will be facing, in general. You cannot begin to write a compiler until you've dealt with this. Then you can start dealing with actual analysis issues:
13) How will you approach parsing? Regular expressions? Tokenizing? Character by character analysis? Substring analysis?
14) How will you store parsing information? Does the type of statement affect how you have to store it? Will you need to store entire programs, or just individual statements? How will you handle statement blocks?
15) What language will you write your compiler in? Will it easily support your strategy so far?
For reference, you may want to look at the source code for GCC and FPC. They should give you lots of examples of strategies you can use when writing a compiler. Another possibility would be to write an interpreter, rather than a compiler. QuickBasic and Pascal both have pretty simple syntax that might be useful for a source language.
Unfortunately, I can't offer any code at this point, because it appears to be too early in the design process for that. "The C++ Programming Language" has a simple parser/tokenizer for a calculator that follows order of operations. The code and explanation for it is about 13 pages long. A full compiler will be significantly longer, including all of the planning and documentation required to plan it out.
First of all thank you again for helping, I am reading a lot and trying hard so thank you really. Here are some answers:
1) What is your source language?
I want to start extremely simple.
GRAMMAR:
S -> expr | int | float
expr -> <int> <op> <int> | <float> <op> <float> | <int> <op> <float> | <float> <op> <int>
int -> digit<int> | digit
float -> <int>.<int>
<op> -> +
digit -> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
VALID LANGUAGE EXAMPLES:
12
2231.2
2 + 2
4 + 3.5
1.4 + 2.1
OPERATIONS:
add(int|float, int|float);
print(string);
So my language basically will be a simple calculator that just adds floats and ints.
What I need to decide now is what assembler should I write for? Last time I learned HLA assembler. You have a better suggestion?
2) What are some common statements in your source language?
The above.Just numerical stuff.
3) What are some common complicated statements in your source language?
None.
8) What is your target language?
I guess HLA, as I said above, but I am opened to sugestions for other assemblers.
13) How will you approach parsing? Regular expressions? Tokenizing? Character by character analysis? Substring analysis?
I am reading a book flex and yacc now for that and modern compiler implementation in java.
Yep, I saw the c++ grammar before, and I know what to do. Just in small steps.
So for now just let me know what you think for question 8
And i don't know what to write it in. Maybe in python because it handles strings really easy so i will cut lot of work. By the way can I have my compiler automatically to call the assembler and handle it over the file? So i don't have to give the file to my compiler and then my compiler will produce output text file of that ready for the assembler, and then take that file and give it to the assembler to compile into object code. So it is not two way process, can i somehow make one thing of that? Maybe include the assembler in my project?
Regards
I'm not familiar with HLA. If you convert it to HLA, you should be able to use the system() command (in most languages) to call your compiler against the resulting code.
There are currently 1 users browsing this thread. (0 members and 1 guests)
Bookmarks