Jump to content

Advice on importing spreadsheet data into a relational DBMS

- - - - -

  • Please log in to reply
3 replies to this topic

#1
Kaishain

Kaishain

    Newbie

  • Members
  • Pip
  • 6 posts
Good afternoon,

I could do with a bit of advice on the best way to tackle a small problem in my newest project.

Each month I'm going to receive a spreadsheet of records. I need to remove any data (columns) from the sheet that I don't need and then import the cleansed data into a table in a relational database.

I've already identified a few options for carrying out these tasks, but I can't decide which would be best, so any advice would be greatly appreciated!


The first option is to have a Perl script open the spreadsheet and remove the data that I don't need.

From there, I can then have the same Perl script generate some SQL and then connect to the database and execute the SQL itself (a bit like this: Calling Relational Databases from Perl).

Alternatively, I could have the Perl script generate some XML or SQL and store it in a separate file which the DBMS would then import/run.

Or another alternative would be to have the Perl script generate the XML and then use Java to implement XML-DBMS (rpbourret.com - XML-DBMS: Middleware for Transferring Data between XML Documents and Relational Databases).

Lastly, I could just use JDBC in a Java application to open the spreadsheet, cleanse it and then connect to the DBMS and execute SQL against it.


These are my initial ideas, but I'm open to suggestion! I have experience with Perl, Java and SQL so it'd be nice to stick to just those, if possible.

It's also imperative that the data is added to the database successfully (I'm working with mission critical data, so I can't afford any inaccuracies, duplicate entries, etc.). With that in mind, I thought it might be an idea to do something along the lines of creating a stored procedure to try to commit/rollback the cleansed data, so I could ensure that the data is added accurately and in its entirety.

I'm also not completely sure which relational DBMS I'm going to use. I think it would be best to use MS SQL (as my client runs Windows), but I've been warned about certain fees/licenses attached to some connection drivers which I'd like to avoid.


Sorry that's a bit long winded, but any suggestions would be really helpful at this stage!

Thanks!
----
There's no place like 127.0.0.1

#2
WingedPanther

WingedPanther

    A spammer's worst nightmare

  • Moderators
  • 16,831 posts
  • Location:Upstate, South Carolina
  • Programming Language:C, C++, PL/SQL, Delphi/Object Pascal, Pascal, Transact-SQL, Others
  • Learning:Java, C#, PHP, JavaScript, Lisp, Fortran, Haskell, Others
Another option: since it's in a spreadsheet, create a new column that has a formula that creates an insert statement for that row. You can then copy/paste that column into a query analyzer and run it.
Programming is a branch of mathematics.
My CodeCall Blog | My Personal Blog

#3
Kaishain

Kaishain

    Newbie

  • Members
  • Pip
  • 6 posts
That's a cunning idea - I like it!

The only problem with doing something like that is that I wouldn't be the one to run it each month. Instead, someone else who isn't overly computer literate might be running it, so I'd like to automate it for them as much as possible to reduce the chance of errors being introduced.
----
There's no place like 127.0.0.1

#4
WingedPanther

WingedPanther

    A spammer's worst nightmare

  • Moderators
  • 16,831 posts
  • Location:Upstate, South Carolina
  • Programming Language:C, C++, PL/SQL, Delphi/Object Pascal, Pascal, Transact-SQL, Others
  • Learning:Java, C#, PHP, JavaScript, Lisp, Fortran, Haskell, Others
The formula is copy/paste, and can be made smart enough to check column labels.
Programming is a branch of mathematics.
My CodeCall Blog | My Personal Blog




1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users