I'm a hobbyist programmer, former professional programmer.
I've never had to deal with databases and data storage theory in general, thus I come here to ask you guys this.
Ever seen Microsoft OneNote?
If not, it's easy to find a screenshot on Google.
I want to program a similar program.
The whole reason to even make this program is so that I can carry just 2 files with me (the program and some sort of database) on my pendrive and be able to easily navigate all my countless notepad text files easily, be able to search them using regex, etc.
The similarity to OneNote is: the program will have 3 types of tabs, just like OneNote:
The left column tabs are the "books"
The tabs over the page are the "dividers"
The right column tabs are the "pages"
Now, that means that each page would be the equivalent of a notepad .txt file.
And that means I'd be able to more easily categorize "kinds" of files into dividers and books (url dumps, general notes, reminders, etc would be different dividers or books, depending on my mood and the way I organize each category)
My question is: what's the best way to store all that data?
what's the best way to store all those different text files into just one big file?
I know that asking "what is the best" is kind of empty.
That's why I'd like to admit my very little knowledge about the subject and explain what are the issues I'm considering:
Please keep in mind that I know nothing about data storage so this is probably going to be very very lame.
I was considering the pros and cons between XML and Databases.
with XML, I'd be able to store files of all sorts of sizes (2 MB, 10 kB...) without "wasting" space because I would not be allocating any space like Databases do (to the best of my knowledge, which is zero).
variable sizes would mean slow lookups because i wouldn't be able to do binary tree searches, given that I wouldn't have any pointers to the files.
that would also mean that it would be better to keep files in RAM when working on several different "pages" at once because it would be slow to open and reopen files (because there's no list with pointers to all the files)
with Databases, OTOH, I'd be allocating sizes, so that would mean faster lookup and also not having to keep files in RAM while working on several files because it's so fast to open and reopen files.
The downside is that I'd be wasting disk space and I'd have no idea of how much to allocate for files anyway. (there's no average size for the files I save).
So, is there a way to have:
Variable size for files
Still be able to access files reasonably fast so that I don't have to keep the files I'm working on all in RAM
Fast look up
and a way to make it work well with the kind of hierarchy that I mentioned (books contain dividers, dividers contain pages, pages are the equivalent to notepad .txt files)
Or am I asking too much?
How would you make this work?
My goal is having only two files and use a very reliable database system (reliability is top priority, far far away from performance), anything else is extra.
Thanks in advance.
I was actually wanting to do this program so I can put it in my pendrive and carry all my documents around with me, and also have an interface that allows me to easily browse all of them.
But I was thinking. Maybe I could just format my pendrive as NTFS with 512 byte clusters, that way I can just throw the files in folders and subfolders and just have them there without any database.
The whole reason I was going with a database solution was to not waste cluster space, because I'm going to be saving pretty small files, some of them with just one url in them, so that would waste a lot of cluster space if they're the regular 4096 byte clusters, but not if I use 512 byte clusters... what do you think?
Edited by WingedPanther, 11 April 2009 - 03:42 AM.
Double post


Sign In
Create Account

Back to top









