Good afternoon,
I could do with a bit of advice on the best way to tackle a small problem in my newest project.
Each month I'm going to receive a spreadsheet of records. I need to remove any data (columns) from the sheet that I don't need and then import the cleansed data into a table in a relational database.
I've already identified a few options for carrying out these tasks, but I can't decide which would be best, so any advice would be greatly appreciated!
The first option is to have a Perl script open the spreadsheet and remove the data that I don't need.
From there, I can then have the same Perl script generate some SQL and then connect to the database and execute the SQL itself (a bit like this: Calling Relational Databases from Perl).
Alternatively, I could have the Perl script generate some XML or SQL and store it in a separate file which the DBMS would then import/run.
Or another alternative would be to have the Perl script generate the XML and then use Java to implement XML-DBMS (rpbourret.com - XML-DBMS: Middleware for Transferring Data between XML Documents and Relational Databases).
Lastly, I could just use JDBC in a Java application to open the spreadsheet, cleanse it and then connect to the DBMS and execute SQL against it.
These are my initial ideas, but I'm open to suggestion! I have experience with Perl, Java and SQL so it'd be nice to stick to just those, if possible.
It's also imperative that the data is added to the database successfully (I'm working with mission critical data, so I can't afford any inaccuracies, duplicate entries, etc.). With that in mind, I thought it might be an idea to do something along the lines of creating a stored procedure to try to commit/rollback the cleansed data, so I could ensure that the data is added accurately and in its entirety.
I'm also not completely sure which relational DBMS I'm going to use. I think it would be best to use MS SQL (as my client runs Windows), but I've been warned about certain fees/licenses attached to some connection drivers which I'd like to avoid.
Sorry that's a bit long winded, but any suggestions would be really helpful at this stage!
Thanks!
Advice on importing spreadsheet data into a relational DBMS
Started by Kaishain, Nov 11 2010 07:49 AM
3 replies to this topic
#1
Posted 11 November 2010 - 07:49 AM
----
There's no place like 127.0.0.1
There's no place like 127.0.0.1
|
|
|
#2
Posted 11 November 2010 - 05:09 PM
Another option: since it's in a spreadsheet, create a new column that has a formula that creates an insert statement for that row. You can then copy/paste that column into a query analyzer and run it.
#3
Posted 11 November 2010 - 11:30 PM
That's a cunning idea - I like it!
The only problem with doing something like that is that I wouldn't be the one to run it each month. Instead, someone else who isn't overly computer literate might be running it, so I'd like to automate it for them as much as possible to reduce the chance of errors being introduced.
The only problem with doing something like that is that I wouldn't be the one to run it each month. Instead, someone else who isn't overly computer literate might be running it, so I'd like to automate it for them as much as possible to reduce the chance of errors being introduced.
----
There's no place like 127.0.0.1
There's no place like 127.0.0.1
#4
Posted 12 November 2010 - 07:09 PM
The formula is copy/paste, and can be made smart enough to check column labels.
1 user(s) are reading this topic
0 members, 1 guests, 0 anonymous users


Sign In
Create Account

Back to top









