Jump to content


Check out our Community Blogs

Register and join over 40,000 other developers!


Recent Status Updates

View All Updates

Photo
- - - - -

Where can I learn to create a programming language?

compilers

This topic has been archived. This means that you cannot reply to this topic.
39 replies to this topic

#1 Kreative

Kreative

    CC Newcomer

  • Member
  • PipPip
  • 20 posts

Posted 16 October 2015 - 11:19 AM

Hello,

 

This is my first time in these forums... I am fluent in Java, can confidently use JavaScript, C#, Ruby, Python, Lua etc, I have a good idea regarding how C++ works as I've encountered it countless times, however I am considerably naive in computer science regarding anything deeper towards hardware from programming such as compilers, interpreters, though I have an okay understanding of the basic theory of how hardware and software work and how they interact. With this knowledge, where can I begin learning topics necessary for and how I can go about creating my own programming language? Any help is very appreciated, including forms as simple as links to videos or step-by-step guides. Thanks very much. :)

Kind Regards,
Kreative 



#2 WingedPanther73

WingedPanther73

    A spammer's worst nightmare

  • Moderator
  • 17757 posts

Posted 17 October 2015 - 05:53 AM

The first thing I would do is read this book for background on how it works, in general: http://smile.amazon....volution of C++

 

I'm going to assume you want to build a language as a simple exercise. I would advise you to start by looking at some much simpler languages than the ones you've listed above. For example, become familiar with C or Pascal. Lua may work for your purposes, I'm not sure. You'll also want to do this as an interpreted language, rather than a compiled language. Compiling generally involves translating a program into assembly and then invoking the native assembly compiler from there.

 

The key concepts you'll run into are tokenizing and parsing text.


Programming is a branch of mathematics.
My CodeCall Blog | My Personal Blog

My MineCraft server site: http://banishedwings.enjin.com/


#3 dargueta

dargueta

    I chown trolls.

  • Moderator
  • 4854 posts

Posted 20 October 2015 - 08:09 PM

This tutorial series is older than me, but I still find it helpful. Enough of it is still relevant to warrant a look.

 

http://www.compilers...c.com/crenshaw/


sudo rm -rf / && echo $'Sanitize your inputs!'


#4 Kreative

Kreative

    CC Newcomer

  • Member
  • PipPip
  • 20 posts

Posted 22 October 2015 - 10:18 AM

The first thing I would do is read this book for background on how it works, in general: http://smile.amazon....volution of C++

 

I'm going to assume you want to build a language as a simple exercise. I would advise you to start by looking at some much simpler languages than the ones you've listed above. For example, become familiar with C or Pascal. Lua may work for your purposes, I'm not sure. You'll also want to do this as an interpreted language, rather than a compiled language. Compiling generally involves translating a program into assembly and then invoking the native assembly compiler from there.

 

The key concepts you'll run into are tokenizing and parsing text.

I do not want to build a language as a simple exercise, I'll do almost anything it takes and can you elaborate by the purpose of looking at much simpler languages and what exactly do you mean by "simple"? I'll have a look if I can get or find the book.

I'll see if I can read dergueta's tutorial series too, if either of the book suggested by WingedPanther73 or dergueta's tutorial series covers this you don't have to answer, but if it doesn't or you haven't read either and know how to create a programming language, could you summarise the steps you would take from scratch to make your own programming language e.g.: what software you'd use, how would you use it and where did you learn how to use it and the kind?

Kind Regards,
Kreative



#5 Gikoskos

Gikoskos

    CC Newcomer

  • Member
  • PipPip
  • 21 posts

Posted 22 October 2015 - 07:17 PM

I do not want to build a language as a simple exercise, I'll do almost anything it takes and can you elaborate by the purpose of looking at much simpler languages and what exactly do you mean by "simple"? I'll have a look if I can get or find the book.

I'll see if I can read dergueta's tutorial series too, if either of the book suggested by WingedPanther73 or dergueta's tutorial series covers this you don't have to answer, but if it doesn't or you haven't read either and know how to create a programming language, could you summarise the steps you would take from scratch to make your own programming language e.g.: what software you'd use, how would you use it and where did you learn how to use it and the kind?

Kind Regards,
Kreative

Hey Kreative,

 

ok so I don't really know how much of an experienced programmer you are or what projects you have worked on the past, but making a programming language is not an easy task.

 

You need to understand that there are two very discrete stages at creating a programming language. The first is creating a specification for you language. That is everything from the typesystem and your language's syntax to the communication protocol between your language and your computer.

 

The second is the development of a compiler or an interpreter software (depending on how you want your language to be executed) that takes text files which contain source code for your language, read the source and translate it for the computer to understand. In my engineering degree the Compilers course was the hardest of all and very few students ever chose it, so don't worry if you get discouraged after a while. The hardest part won't be the design of the language itself, since that's the fun part. It's going to be the humongous amount of knowledge that you have to know before you start developing your own language.

 

I respect that you have experience on other high level languages, but what these languages really do is provide you with an easy and abstract way to communicate with the computer without having to delve into stuff like asm or machine language. Understand that just because you're very good at a programming language doesn't mean that you know how the computer works. That's hardly ever the case (unless you know a lot of C, and even then C just gives you a tiny idea).

 

I'll give you a very brief and extremely incomplete review of what a general compiler might look like.

 

A software that compiles source code to machine language might seem in most cases very convoluted, but that's really only if you don't know what you're doing. A compiler program is usually divided into other programs, or stages, that your source code has to run through before reaching the final stage of an ELF (executable file).

 

For example the first program (or stage) your source code might go through is a lexer, which performs lexical analysis on your program and basically turns every single token (look for the definition of token to understand what it really is) into a string. A lexer usually works together with a parser which is the program that reads your file and converts it to data structures which are usually implemented in the form of an abstract syntax tree (you can use any other d.s. you prefer, but I wouldn't recommend using linked lists for this).

 

See this image from Wikipedia to understand it a bit more:

Xxx_Scanner_and_parser_example_for_C.gif

 

After that you run your data structure through the main compiler program which might optimize code away (that means that it finds better ways to implement your code on instructions for the CPU, to either save space or make it faster) or do other things you want it to do, but the final product has to be the creation of an executable program by the PC.

 

Now all this is extremely overly-simplified (euphemism is on purpose) and it doesn't even come close to describing the very complex art of compiling. There's a million stuff you can learn about compilers, or interpreters or anything really as long as it turns your program into something that can be read by the computer, should that be an executable (eg an .exe file) or a script that is interpreted line by line while it is running.

 

The thing is, I might have messed some stuff up or sounded confusing while describing this (since I described it the way I thought about it), but nowadays noone writes a compiler or an interpreter on machine code or even assembly. A compiler for your language is usually written in another language, which is then compiled to that language's compiler and the output is a compiler to your own language's program. If your language reaches a very advanced level you can then write the compiler for your language, using your own language in a process known as bootstrapping (see: https://en.wikipedia...ing_(compilers)). This is simpler than it sounds actually.

 

After you specify the rules surrounding your language (syntax, type-system etc) you can use your knowledge in one of the languages you are already good at, and write a simple compiler to see how it works. Most modern languages began that way (Java, Python etc). Of course it doesn't just take knowing another programming language's syntax in order to implement this. For example if you want to have your language support threading, you really have to know OS-level system calls for your compiler.

 

Usually most languages are written in C due to its low-level and inline assembly capabilities, with ready-made lexer and parser libraries such as gnu flex, yacc and bison (see: http://aquamentus.com/flex_bison.html). If you already have knowledge in C++ then you  might be familiar with pointers and pointer arithmetic so C will be a piece of cake for you. Also you really need to have knowledge of a symbolic language to understand how the computer thinks and works, which isn't that easy (see: http://www.drpaulcarter.com/pcasm/).



#6 WingedPanther73

WingedPanther73

    A spammer's worst nightmare

  • Moderator
  • 17757 posts

Posted 23 October 2015 - 04:21 AM

I do not want to build a language as a simple exercise, I'll do almost anything it takes and can you elaborate by the purpose of looking at much simpler languages and what exactly do you mean by "simple"? I'll have a look if I can get or find the book.

 

I've actually helped a friend work on building an interpreter for a language. Note: he did NOT design the language, just build an interpreter for it. He spent a few months on it, with me feeding him ideas.

My own educational background is in math, not programming. I spent a lot of time in courses where we would spec out mathematical languages, defining valid statements, building them up recursively, etc. These didn't have the full scope of a programming language, just defining operations and basic functions. It was not easy, and we were following in the steps of those who had actually created this stuff.

So my motive for suggesting a simpler language, such as C or Pascal, is to make your life easier. Pascal was designed to be easy to parse with limited bells and whistles. C was designed to be close to ASM, but with enough features to make programming vastly easier. If you just tried building an interpreter for either of those languages, you would start to discover how challenging your goal is, and that's without designing the new language.

C++, for example, has a number of syntactic constructs that are defined in the language specification as "undefined" or "Compiler dependent behavior". To implement function overloading, a compiler needs to build multiple versions of the same function, and then figure out which one applies in a given call. C and Pascal don't have that issue.

If you're trying to design a language based on the high-level languages you're familiar with, I strongly suspect you'll find yourself trying allow yourself to do the many cool language features that are actually incredibly hard to implement well. Worse, you'll confuse your syntax and likely "allow" things that cannot be parsed in a consistent manner. Just as you started learning programming by learning about variables and arithmetic and if statements, start building a language with limited features.


Programming is a branch of mathematics.
My CodeCall Blog | My Personal Blog

My MineCraft server site: http://banishedwings.enjin.com/


#7 dargueta

dargueta

    I chown trolls.

  • Moderator
  • 4854 posts

Posted 25 October 2015 - 12:03 PM

So my motive for suggesting a simpler language, such as C or Pascal, is to make your life easier. Pascal was designed to be easy to parse with limited bells and whistles. C was designed to be close to ASM, but with enough features to make programming vastly easier. If you just tried building an interpreter for either of those languages, you would start to discover how challenging your goal is, and that's without designing the new language.

 

As someone with a background in computer science (and who's taken a compiler course in college) I strongly recommend you take WP's advice start with an interpreter for something simple. QuickBASIC, for example, would be a good place to start in my opinion. That way you can test your implementation against an existing interpreter and see if your interpreter's matches the original!


Edited by dargueta, 25 October 2015 - 12:04 PM.

sudo rm -rf / && echo $'Sanitize your inputs!'


#8 hockey97

hockey97

    CC Resident

  • Advanced Member
  • PipPipPipPip
  • 95 posts

Posted 27 October 2015 - 07:20 AM

OP: why do you want to build a programming language? There's no point on doing it since C++ and many other languages exists. There's no reason to developed a  new programming language since all languages do the same thing. No one really makes a new programming language anymore. C++ can work any any system. As long the hardware chip sets aren't customized.  C++ will work on most systems. 

 

I doubt you have the ability to program a programming language. Even if you did you would need a team of programmers to build it for you in order to finish such a project anytime soon. It's not easy and it's not a quick project.  If you never build one then you will never make anything better that is already made by people that have PhD's in computer science. 



#9 Kreative

Kreative

    CC Newcomer

  • Member
  • PipPip
  • 20 posts

Posted 27 October 2015 - 12:44 PM

OP: why do you want to build a programming language? There's no point on doing it since C++ and many other languages exists. There's no reason to developed a  new programming language since all languages do the same thing. No one really makes a new programming language anymore. C++ can work any any system. As long the hardware chip sets aren't customized.  C++ will work on most systems. 

 

I doubt you have the ability to program a programming language. Even if you did you would need a team of programmers to build it for you in order to finish such a project anytime soon. It's not easy and it's not a quick project.  If you never build one then you will never make anything better that is already made by people that have PhD's in computer science. 

 

I want to create a programming language because of some of the limitations I see in major programming languages such as Java and perhaps C++, I don't know too much about C, C++ etc to be sure. The main limitation, that I have discovered many times, is the restriction of creating a field/member or method/function that is abstract/virtual, therefore inherited, and the same across all instances of a class. I've heard Delphi and perhaps Ruby allows something like this, but not java or C++. After encountering this obstacle, I've tried to search for other programming languages, yet failed to find one that has good features from major programming languages and features that are lacking in major programming languages. I have come across, though not delved very deep into, some other good features that some programming languages have that would decrease code repetition, save space, increase efficiency and/or etc etc etc, and that most major programming languages don't. Many agree that these features are lacking from many programming languages and should be implement that I know, and many disagree though have failed to convince me, or get me to understand, how these features would cause more harm than good.

I don't doubt, I know I lack many skills and much knowledge necessary to be able to program a programming languages, and I am eager to overcome this problem, even if it means I have to gather a team. I know, or at least do greatly believe, that creating a programming language is very difficult, perhaps a million times the difficulty of learning a programming language and programming with it, and a very long process. However, that currently does not discourage me as a lot of my entertainment revolves around creation, especially within computers, and these obstacles that I encounter daily are motivating me to find a solution, such as creating a programming language. 

@Gikosokos
The over-simplified explanation you wrote doesn't feel very surprising or confusing to me, you practically said what is done in order to create a programming language, and that much I clearly understand. Therefore, I am more interested in understanding HOW the things you said are done to create a programming language, are done.

@WingedPanther73
I never began learning a programming language step by step and in a very orderly fashion, I skimmed through a lot of tutorials, picked out things that seemed relevant and asked very specific and narrow questions, all in order to achieve a certain goal. So I wanted to make a game in Lua, I never made or intended to make any small games to get myself used to Lua, I just head straight for learning only the specific information involved in making the game I wanted to make. Many, many say this is a bad way of learning as something that may seem unnecessary or irrelevant to what you are trying to achieve is actually quite the opposite as it's needed to formulate a better understanding of a general topic, which is needed to achieve a goal properly. Though this way of learning has never failed me and I never find myself regretting not reading that extra information provided above or below the section of a tutorial which seemed to contain relevant information as, if I need to, I go back and read it if it's necessary.

 

I am asking in these forums because I keep having trouble finding good tutorials or information about compilers and everything from 0s and 1s to a programming language.

Kind Regards,
Kreative



#10 hockey97

hockey97

    CC Resident

  • Advanced Member
  • PipPipPipPip
  • 95 posts

Posted 27 October 2015 - 01:22 PM

I want to create a programming language because of some of the limitations I see in major programming languages such as Java and perhaps C++, I don't know too much about C, C++ etc to be sure. The main limitation, that I have discovered many times, is the restriction of creating a field/member or method/function that is abstract/virtual, therefore inherited, and the same across all instances of a class. I've heard Delphi and perhaps Ruby allows something like this, but not java or C++. After encountering this obstacle, I've tried to search for other programming languages, yet failed to find one that has good features from major programming languages and features that are lacking in major programming languages. I have come across, though not delved very deep into, some other good features that some programming languages have that would decrease code repetition, save space, increase efficiency and/or etc etc etc, and that most major programming languages don't. Many agree that these features are lacking from many programming languages and should be implement that I know, and many disagree though have failed to convince me, or get me to understand, how these features would cause more harm than good.

I don't doubt, I know I lack many skills and much knowledge necessary to be able to program a programming languages, and I am eager to overcome this problem, even if it means I have to gather a team. I know, or at least do greatly believe, that creating a programming language is very difficult, perhaps a million times the difficulty of learning a programming language and programming with it, and a very long process. However, that currently does not discourage me as a lot of my entertainment revolves around creation, especially within computers, and these obstacles that I encounter daily are motivating me to find a solution, such as creating a programming language. 

@Gikosokos
The over-simplified explanation you wrote doesn't feel very surprising or confusing to me, you practically said what is done in order to create a programming language, and that much I clearly understand. Therefore, I am more interested in understanding HOW the things you said are done to create a programming language, are done.

@WingedPanther73
I never began learning a programming language step by step and in a very orderly fashion, I skimmed through a lot of tutorials, picked out things that seemed relevant and asked very specific and narrow questions, all in order to achieve a certain goal. So I wanted to make a game in Lua, I never made or intended to make any small games to get myself used to Lua, I just head straight for learning only the specific information involved in making the game I wanted to make. Many, many say this is a bad way of learning as something that may seem unnecessary or irrelevant to what you are trying to achieve is actually quite the opposite as it's needed to formulate a better understanding of a general topic, which is needed to achieve a goal properly. Though this way of learning has never failed me and I never find myself regretting not reading that extra information provided above or below the section of a tutorial which seemed to contain relevant information as, if I need to, I go back and read it if it's necessary.

 

I am asking in these forums because I keep having trouble finding good tutorials or information about compilers and everything from 0s and 1s to a programming language.

Kind Regards,
Kreative

 

 

The problem is that what you ask for that is for free doesn't exist. You can buy many books and read many about compilers. You won't get any free information online or a tutorial that will take you step by step. 

 

 What you're asking is like asking someone on the internet  about how to build a space shuttle like NASA. The people involved in those projects have PhD's they went to college.  You will never find a tutorial online on how to build a space shuttle step by step.  Since, the subject matter is very complex no one person can just give a small tutorial on how to do it. It will take at least a few years to explain it entirely  and no one gives such information for free if they have to put work and  a lot of effort into it.

 

Learning how a compiler works requires you to have an idea of how a processor works and how ASM works. Even if you learned how to do this. It's very very hard for a single person to produce a working language that is way better that what's already available. C++ and many high level languages are written by teams of people.  The first versions were written by a single person. Over time they needed to add in complex features. These features used today in programming languages are developed and maintained by teams of programmers. These people went to college to learn how to do such things.

 

My point is that you won't find the answer online. You would need to go to college for a CS major and learn it that way.  You can buy books but those books assume some working knowledge about programming, ASM, a high level language you experienced and how computers work. 



#11 Kreative

Kreative

    CC Newcomer

  • Member
  • PipPip
  • 20 posts

Posted 27 October 2015 - 03:05 PM

The problem is that what you ask for that is for free doesn't exist. You can buy many books and read many about compilers. You won't get any free information online or a tutorial that will take you step by step. 

 

 What you're asking is like asking someone on the internet  about how to build a space shuttle like NASA. The people involved in those projects have PhD's they went to college.  You will never find a tutorial online on how to build a space shuttle step by step.  Since, the subject matter is very complex no one person can just give a small tutorial on how to do it. It will take at least a few years to explain it entirely  and no one gives such information for free if they have to put work and  a lot of effort into it.

 

Learning how a compiler works requires you to have an idea of how a processor works and how ASM works. Even if you learned how to do this. It's very very hard for a single person to produce a working language that is way better that what's already available. C++ and many high level languages are written by teams of people.  The first versions were written by a single person. Over time they needed to add in complex features. These features used today in programming languages are developed and maintained by teams of programmers. These people went to college to learn how to do such things.

 

My point is that you won't find the answer online. You would need to go to college for a CS major and learn it that way.  You can buy books but those books assume some working knowledge about programming, ASM, a high level language you experienced and how computers work. 

I see what you mean, I've come across a book called write great code, which isn't just about code, it does talk about compilers to some extent, and even though you have to buy it, I've found PDFs online for like the first volume as an example. I'd love to go to college or uni to learn computer science but I haven't finished school yet, generally I feel like I am extremely confident with programming theory, not creation of programming languages theory or compiler theory, but just how most programming languages work today kind of theory. As I have come across some of the obstacles in my programming, I was determined to find a solution and wasn't getting nowhere for like one year now, and as I searched I automatically built a very solid and concrete understanding of programming theory, which I believe could be enough for understanding some books. Generally speaking, I really dislike books, I prefer audio-visual learning or getting taught by a person or people, though I'll do anything it takes.

Just to give some detail of what I know so you can understand my situation better I'll give a crude list of examples, I know how pointers or references work and to some extent in memory, I know list elements consist of a value and a pointer to the next list element in the memory. From what I've read, I believe that values allocated in memory that aren't resulting exponents of two are stored in multiples, for example a 24-bit would be stored in three bytes. I've seen examples of functions being broken down into memory instructions, and nothing seems strange, suspicious or new on how it is done. I've never seen examples but I can imagine how compilers change code to make it more time or space efficient when code is converted into more basic computer languages and how classes are instantiated, that instances of classes are stored with class pointers and the implemented values of the identifiers within the class, perhaps with pointers to what identifier the value belongs to. This is what I could pull out of my unsure understanding at the moment.

Kind Regards,
Kreative



#12 WingedPanther73

WingedPanther73

    A spammer's worst nightmare

  • Moderator
  • 17757 posts

Posted 27 October 2015 - 04:37 PM

I want to create a programming language because of some of the limitations I see in major programming languages such as Java and perhaps C++, I don't know too much about C, C++ etc to be sure. The main limitation, that I have discovered many times, is the restriction of creating a field/member or method/function that is abstract/virtual, therefore inherited, and the same across all instances of a class. I've heard Delphi and perhaps Ruby allows something like this, but not java or C++. After encountering this obstacle, I've tried to search for other programming languages, yet failed to find one that has good features from major programming languages and features that are lacking in major programming languages. I have come across, though not delved very deep into, some other good features that some programming languages have that would decrease code repetition, save space, increase efficiency and/or etc etc etc, and that most major programming languages don't. Many agree that these features are lacking from many programming languages and should be implement that I know, and many disagree though have failed to convince me, or get me to understand, how these features would cause more harm than good.

.

.

.

Kind Regards,
Kreative

 

What you've stumbled across is the need to make compromises in desired language features. Also, the example you cited doesn't sound quite right for C++. I suggest you delve into it more.

Regardless, language design decisions become very tricky. Java only allows single inheritance + interfaces. C++ allows multiple inheritance and no interfaces. By doing this, Java avoids potential problems resolving functions when two parents have a common parent. On the other hand, C++ can define part of a class that is meant to be inherited, making inheritance far more powerful than interfaces. Writing a C++ compiler is vastly harder than writing a Java compiler, as a consequence.

The fact that you want to dive into tricky inheritance issues when you aren't familiar with a procedural language is, to me, quite scary. I would absolutely NOT advise you to do that as your first "built" language. Cut your teeth on something easier, or you're setting yourself up for failure. Once you've written something like a QuickBasic interpretter, then you'll be in a position to start thinking about how to handle object-oriented languages.


Programming is a branch of mathematics.
My CodeCall Blog | My Personal Blog

My MineCraft server site: http://banishedwings.enjin.com/





Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download