Jump to content

OneNote clone - use XML or DB?

- - - - -

This topic has been archived. This means that you cannot reply to this topic.
7 replies to this topic

#1
nihilist

nihilist

    Newbie

  • Members
  • Pip
  • 4 posts
Hi guys!

I'm a hobbyist programmer, former professional programmer.
I've never had to deal with databases and data storage theory in general, thus I come here to ask you guys this.

Ever seen Microsoft OneNote?
If not, it's easy to find a screenshot on Google.

I want to program a similar program.
The whole reason to even make this program is so that I can carry just 2 files with me (the program and some sort of database) on my pendrive and be able to easily navigate all my countless notepad text files easily, be able to search them using regex, etc.

The similarity to OneNote is: the program will have 3 types of tabs, just like OneNote:
The left column tabs are the "books"
The tabs over the page are the "dividers"
The right column tabs are the "pages"

Now, that means that each page would be the equivalent of a notepad .txt file.
And that means I'd be able to more easily categorize "kinds" of files into dividers and books (url dumps, general notes, reminders, etc would be different dividers or books, depending on my mood and the way I organize each category)

My question is: what's the best way to store all that data?
what's the best way to store all those different text files into just one big file?

I know that asking "what is the best" is kind of empty.
That's why I'd like to admit my very little knowledge about the subject and explain what are the issues I'm considering:
Please keep in mind that I know nothing about data storage so this is probably going to be very very lame.

I was considering the pros and cons between XML and Databases.
with XML, I'd be able to store files of all sorts of sizes (2 MB, 10 kB...) without "wasting" space because I would not be allocating any space like Databases do (to the best of my knowledge, which is zero).
variable sizes would mean slow lookups because i wouldn't be able to do binary tree searches, given that I wouldn't have any pointers to the files.
that would also mean that it would be better to keep files in RAM when working on several different "pages" at once because it would be slow to open and reopen files (because there's no list with pointers to all the files)

with Databases, OTOH, I'd be allocating sizes, so that would mean faster lookup and also not having to keep files in RAM while working on several files because it's so fast to open and reopen files.
The downside is that I'd be wasting disk space and I'd have no idea of how much to allocate for files anyway. (there's no average size for the files I save).

So, is there a way to have:
Variable size for files
Still be able to access files reasonably fast so that I don't have to keep the files I'm working on all in RAM
Fast look up
and a way to make it work well with the kind of hierarchy that I mentioned (books contain dividers, dividers contain pages, pages are the equivalent to notepad .txt files)

Or am I asking too much?
How would you make this work?
My goal is having only two files and use a very reliable database system (reliability is top priority, far far away from performance), anything else is extra.

Thanks in advance.

I was actually wanting to do this program so I can put it in my pendrive and carry all my documents around with me, and also have an interface that allows me to easily browse all of them.

But I was thinking. Maybe I could just format my pendrive as NTFS with 512 byte clusters, that way I can just throw the files in folders and subfolders and just have them there without any database.

The whole reason I was going with a database solution was to not waste cluster space, because I'm going to be saving pretty small files, some of them with just one url in them, so that would waste a lot of cluster space if they're the regular 4096 byte clusters, but not if I use 512 byte clusters... what do you think?

Edited by WingedPanther, 11 April 2009 - 03:42 AM.
Double post


#2
WingedPanther

WingedPanther

    A spammer's worst nightmare

  • Moderators
  • 16,831 posts
If you're just looking to be able to search across multiple text files, there are a lot of programs that can do that already (such as jEdit, CrimsonEditor, etc). A database would normally store a text file in a BLOB field, which is frequently not searchable in queries. XML can be used as a type of file-based database, along with several others such as SQLite. I suspect you need to worry more about how you want to display the data than store it at this point.
Programming is a branch of mathematics.
My CodeCall Blog | My Personal Blog

#3
Guest_Jordan_*

Guest_Jordan_*
  • Guests
I'm partial to databasesWiki and that is the route I'd take. You don't have to use a blob for text which can then be searched. However, there is nothing wrong with storing all of your data in XMLWiki files. After all, that is the intention of XMLWiki.

As for your second question, I'm not sure what you mean? Store them all in one file?

#4
nihilist

nihilist

    Newbie

  • Members
  • Pip
  • 4 posts
Ok, I appreciate you guys' willingness to try and help me out.
Let me clarify this.

This is the original OneNote:
(i can't seem to put a link here, but please just do a google image search for "onenote" and take a look at any screenshot from microsoft.com)

Nevermind the ability to use a tablet with it.
What I like in it is the navigation system:
on the left side you have the books: "Shared", "OneNoteGuide" (the ones that come with it...), School, Work, Home.
Up above you have the dividers of the School book: Research, Astronomy, Math, History, Chemistry.
On the left side you have the pages of the Astronomy divider: Mercury, Venus, etc.

It's simple but it's all I need, really.
Now, on OneNote you can add pictures, sounds, and whatnot to your page. Nevermind that.
I just need raw text. And the ability to search it.

What I'm trying to figure is what's the best way to store all this text.
The way I want to display the files, to answer the first question, is just like the screenshot above.

I forgot to mention that I do want to be able to encrypt the whole database with a password, so as to make it unreadable by other people.

Oh, and by storing all the texts in one file I mean that I want to have ideally just two files on my pendrive: the software, and the database. I do not want each text ("page") to be its own file.
Also, because of my need to encrypt it, it would be essential to have all the text in one database file.

A friend of mine told me that if I want to be able to carry this kind of program and its database around with me, that I wouldn't be able to use MySQL since you have to install that in the machine you're gonna be using.
If that's the case then indeed MySQL would be a no-no for me.
I need something that works off of my pendrive and that I don't have to install anything on the machine for it to work.

Thanks.

#5
WingedPanther

WingedPanther

    A spammer's worst nightmare

  • Moderators
  • 16,831 posts
You may want to consider an encrypted zip file that contains .txt files in it. They're easy to search and wouldn't require any overhead. If you felt the need to go more advanced, SQLite is probably your best bet.
Programming is a branch of mathematics.
My CodeCall Blog | My Personal Blog

#6
nihilist

nihilist

    Newbie

  • Members
  • Pip
  • 4 posts

WingedPanther said:

You may want to consider an encrypted zip file that contains .txt files in it. They're easy to search and wouldn't require any overhead. If you felt the need to go more advanced, SQLite is probably your best bet.

Whoa! That would be the perfect solution for me.
Not having to deal with the encryption myself and then having it all in one file, AND being able to even have folders in it!
Yeah, if I can have that, then screw databases.
I'm trying to find info on how to search .zip files programmatically.
How do you do it?
Can you point me in the right direction?
I'll be using VB.net I think but any VB6 or C/C++ codes will be fine.

Thanks a lot for the solution.

#7
WingedPanther

WingedPanther

    A spammer's worst nightmare

  • Moderators
  • 16,831 posts
I know that C++ with wxWidgets can open files in a zip file. Just open each file inside it and search. Log results someplace so you can decide which txt file to work with. wxWidgets treats zip files just like a folder, so you would be doing "ordinary" file processing.
Programming is a branch of mathematics.
My CodeCall Blog | My Personal Blog

#8
nihilist

nihilist

    Newbie

  • Members
  • Pip
  • 4 posts
Hmm, I think I changed my mind.

I'm thinking of using SQLite + VB.Net if only I can get it to work. It's giving me some errors and warnings that I'm still trying to solve.

I can't seem to find code snippets showing how to search files inside zip files, and I don't think that's possible without creating a temp file, and that would be a security breach for me, and also a performance issue.

SQLite seems like a great solution. I'm looking at some code and it seems simple. I still have to get the example I downloaded to work, but I will.

Thanks for the help!

Any known advantages of Embedded Firebird DB over SQLite?