Hello,
I'm writing a new RDBMS engine (just for fun of writing). If anyone is interested, we could make it a new open source project, just as MySQL or PostgreSQL.
The unique features of my project are:
- autonomic database tuning and adaptation to workload
- autonomic memory configuration
- strong query planner, much better than that of other opensource engines
- native replication and HA
- true transaction serializability
First, I want tomake it performance oriented and easy to use, and NOT feature-rich (so don't expect support for zillions of types, spatial indexes or stored procedure programming languages).
I have already written some parts of it in Java (SQL parser and the query planner) . Anyone wants to join?
Great Project...
I'm of no help writing,
but give me a shout when you need a tester.![]()
Enough said.I have already written some parts of it in Java (SQL parser and the query planner).
Could you elaborate more on this? If you tried to sound ironic, you are badly mistaken - Java is a perfect language to write RDBMS in, much better than C or C++. A naive nested loops join hand optimized in assembly will be always much slower than a buffered nested loops join with hashing coded in Java. These are the algorithms that make RDBMS fast, not the language it is implemented in.Enough said.I have already written some parts of it in Java (SQL parser and the query planner).
And the query planner I wrote for a research project is much better than the PostgreSQL's* - although it is slower, but produces much better plans (e.g. considers bushy join trees, rewrites correlated subselects into joins, and leverages limit clause to favour pipelining).
*) It is quite easy to be better - PostgreSQL planner is really dumb - e.g it cannot estimate selectivity of such simple queries as: SELECT * FROM table WHERE a < b. Or when you add a LIMIT 10 clause, it proposes to compute everything and then output only first 10 rows. Or it cannot use index-only scans (ok, in fact this is a poor MVCC implementation problem, not the planner's - indexes have no version info, so every index scan has to fetch rows from the indexed tables). MySQL is even more dumb (ancient rule based planner).
Haha, java is absolutely not the language to use to write a performance RDBMS. It isn't the language to write a performance anything.
A. Virtual Machine. Right there, you've hit an unstoppable performance block.
B. Big Endian. Everything in Java is big endian. Working with the OS or reading configuration files, saving SQL dumps, ect... will all have to be flipped every time you read/write it.
C. Why would you write your loop in assembly? Thats back in the day where C compilers were dumber then humans. GCC or VC usually knows better then you do.
D. Loops in Java will never, ever, ever be faster then the equivalent loop in C. If you benchmarked this and somehow Java came out on top, you need to re-learn C.
Java is OK for certain things. And now that its open source and getting active community work, it'll just keep getting better. But producing a performance critical application is not in its job description. A stability critical application, maybe.
And don't even go the portability route. Its very easy to make your C/C++ program portable.
Some benchmarks show you are totally wrong.
H2 Java RDBMS IS faster than both MySQL and PostgreSQL in case of most queries tested in open-source benchmarks. This does not prove Java is faster than C, but it proves it is possible to write just as fast or even faster system in Java in much shorter time (H2 has been written by one guy in 2 years). Another comparison - Tomcat vs Apache webserver (Tomcat is slightly faster - check yourself if you don't believe).
It seems you don't have a slightest idea how database systems work.
Big endianess is not a performance issue - disks are orders of magnitude slower than memory - so flipping these bytes is unnoticeable. Just try reading a large binary file in Java and in C - the performance is exactly the same.
And having consistent binary format can save you lot of troubles when moving database from one server to another (software in C or C++ cannot do this efficiently if architectures differ).
On the contrary, stability is ALSO a performance issue. How fast is a RDBMS that has memory leaks?
Java also HAS some big performance advantages over C and C++ when it comes to multithreading and locking (also very important in real RDBMSes). Can your optimizing C++ compiler do biased locking or lock coarsening optimizations?
However, the biggest issue is the productivity you can gain wrinting software in high level languages - in the same time you can write a better query planner and more joining algorithms or index types implementations than in low/middle level languages like C or C++. Algorithms are the most important thing affecting performance of the RDBMS. Both MySQL (C++) and PostgreSQL (C) have algorithms from the previous decade and are developed very slowly.
Well, before this turns into a Java VS Misc war, I'm going to be an ass.
You are wrong, and you will find it out the hard way. Enjoy. In 10 years if you prove me wrong, I'd be glad to hear about it.
I do not have to prove anything, because there are already Java RDBMSes written in Java, faster than that in C or C++, at least when it comes to open-source things - so it has already been proved.
When we switched JBoss JMS from using HSQLDB to PostgreSQL we got about 10 times slowdown, though the strict durability of transactions in PostgreSQL (fsync) was switched off, so nothing prevented it from caching all reads and writes in memory. Both the configurations were fully transactional and were writing data to disk.
BTW: Java is a natively compiled language and in strictly numeric benchmarks has very similar performance to C/C++. In most pessimistic cases it loses about 50%, but in most optimistic it also wins about the same (especially when it comes to creating lots of small objects on the heap - C++ is a terrible loser). Also heap memory consumption is very similar, though it is hard to measure (benchmark measurements in Internet are especially unfair - they include permgen and JVM memory in the reported memory consumption in case of Java but skip memory taken by the OS and dynamically linked libraries in case of C++ programs).
Do you have a link?
Maybe I can be a tester for those systems.
I think TkTech is a pretty sharp guy and you seem to be also,
so I don't understand why either of you are bothering with this
discussion, for that matter... I have better things to do also,
good luck with your project JCoder.
I cannot give links, because the forum software doesn't allow me to.
But write "H2 database" and "HSQLDB database" into Google and click the first result.
It is astonishing, that a single guy could do something like H2 in a little over 2 years. HSQLDB has great performance if the whole database fits in memory. And if not, raw computational speed (CPU cycles) almost does not matter - large databases are all about doing I/O efficiently. This can be done with almost any programming language. I would risk saying, it is possible to write a fast RDBMS in Python or Ruby (but I don't like using dynamically typed languages for such large and complex projects).
TkTech still hasn't given any technical arguments to support his point of view (except the "endianess problem", which is rather a great feature* and not a performance problem at all), so yes - this performance offtopic is pointless.
**) Migrating a database is as easy as copying the database folder to another machine, forgeting that the source machine was Windows on x86 and the target one is Solaris / UltraSPARC or 64bit Linux on AMD Athlon... No C/C++ database system can do this.![]()
There are currently 1 users browsing this thread. (0 members and 1 guests)
Bookmarks