I'm not sure if I'm posting this in the right place, but here we go anyway.
I'm going to be designing a code parser/interpreter (something new to me) and I would like some feedback on a couple of things.
1) I've read a few tutorials that I found via Google searches, and I don't understand the value of converting code into tokens. Perhaps I just missed it, but how is this important to the process?
2) I want my interpreter to compile the code that it reads. The question here is should I design a byte code language, or use machine language?
I could not find any good resources, tutorials, or accurate documentation on machine language or the design and use of a byte code language. Does anyone here know these subjects, or could at least point me to a good website or book?
Any feedback or general advice would be greatly appreciated. Direct answers to my questions would be fantastic.
Thanks in advance.
Code Parsing Questions
Started by roboticforest, Dec 28 2008 01:41 AM
6 replies to this topic
#1
Posted 28 December 2008 - 01:41 AM
Dave
|
|
|
#2
Posted 28 December 2008 - 05:40 AM
1) tokenizing the input string is a standard approach to parsing data. It just means you're going to break your input string into smaller and smaller chunks and compare them to the rules for your language. I wouldn't try to parse anything significant without using tokens.
2) machine language depends on the OS/hardware. Byte code will need an interpreter to run it (like the Java VM).
2) machine language depends on the OS/hardware. Byte code will need an interpreter to run it (like the Java VM).
#3
Posted 28 December 2008 - 08:39 AM
Thank you for the reply Panther.
1) I think I'm starting to see where tokens could be useful now.
2) I already understood what they were, what I'm looking for is info on how they work. I wish to understand how the behind the scenes details work when the compiled code is being read by the computer, especially if I'm going to go the route of designing a new byte code language myself.
I already know that if I compile to a byte code language I would need to write a second interpreter (one for my programming language, the second for the byte codes), yet this could allow my program more portability. I believe a simpler solution (and more than enough for my project right now) would be to compile to machine language. My goal is basic x86.
Note that using NASM or some other free compiler will not work for my project. I need to make the parser and compiler myself. Also, I would really like to avoid compiling to a middle language (assembly, for example) before compiling down even further to byte codes, or machine code.
Again, any thoughts on where I can get this kind of information? Or, any personal experience you can share?
Thanks again.
1) I think I'm starting to see where tokens could be useful now.
2) I already understood what they were, what I'm looking for is info on how they work. I wish to understand how the behind the scenes details work when the compiled code is being read by the computer, especially if I'm going to go the route of designing a new byte code language myself.
I already know that if I compile to a byte code language I would need to write a second interpreter (one for my programming language, the second for the byte codes), yet this could allow my program more portability. I believe a simpler solution (and more than enough for my project right now) would be to compile to machine language. My goal is basic x86.
Note that using NASM or some other free compiler will not work for my project. I need to make the parser and compiler myself. Also, I would really like to avoid compiling to a middle language (assembly, for example) before compiling down even further to byte codes, or machine code.
Again, any thoughts on where I can get this kind of information? Or, any personal experience you can share?
Thanks again.
Dave
#4
Posted 28 December 2008 - 04:04 PM
Actually, most compilers compiler to ASM, and then go from ASM to byte code. If you have a debugging mode, it will show you the ASM that was produced. Personally, I would compile to ASM for right now, and let a standard compiler finish going to byte code. Java is open-sourced now, so you may be able to look at the VM code to get an idea of what's involved.
#5
Posted 29 December 2008 - 12:18 PM
I now that too, but as I said before I can't use a standard compiler, I'm making one. I will of course be using a standard compiler to make my compiler. :-)
Since I'm making my own compiler it would be nice (for efficiency) if my compiler didn't translate to a middle language like assembly, but straight to byte code, or machine language. The reason I want this is so that the code can be run the moment the user presses a parse button, or enter on his keyboard.
Thanks for the tip about Java! I didn't know that it had been made open source, and I plan to take a look at that as soon as I can.
Still, are there any books, or online resources I could use as well that explain how machine language, or byte code languages work?
So far the best I've found is a file explaining (and not very well) what each assembly command compiles to, but it lacks detail. For example, it doesn't explain how variables are compiled, or how references to CPU registers are compiled. That's why I'm looking for a tutorial on machine language.
Since I'm making my own compiler it would be nice (for efficiency) if my compiler didn't translate to a middle language like assembly, but straight to byte code, or machine language. The reason I want this is so that the code can be run the moment the user presses a parse button, or enter on his keyboard.
Thanks for the tip about Java! I didn't know that it had been made open source, and I plan to take a look at that as soon as I can.
Still, are there any books, or online resources I could use as well that explain how machine language, or byte code languages work?
So far the best I've found is a file explaining (and not very well) what each assembly command compiles to, but it lacks detail. For example, it doesn't explain how variables are compiled, or how references to CPU registers are compiled. That's why I'm looking for a tutorial on machine language.
Dave
#6
Posted 29 December 2008 - 01:33 PM
I don't have anything else to help out, sorry.
#7
Posted 29 December 2008 - 01:46 PM
That's cool. I appreciate what you've done so far. Perhaps someone else will know.
In the mean time I'm going to keep searching Google, and slowly drudging through source code. I'll get it figured out at some point. I always do.
In the mean time I'm going to keep searching Google, and slowly drudging through source code. I'll get it figured out at some point. I always do.
Dave


Sign In
Create Account


Back to top









