Closed Thread
Results 1 to 7 of 7

Thread: need to convert html file to excel, what is the best way?

  1. #1
    fftw_ayi is offline Newbie
    Join Date
    Feb 2009
    Posts
    3
    Rep Power
    0

    need to convert html file to excel, what is the best way?

    Hello, first time on the forum.
    I have to convert a bunch of html files (like hundreds) in a certain folder location and convert them into excel files. What would be the best programming language to do this?

    Once it's done, I plan to run the code on schedule on a monthly basis, unattended.

  2. CODECALL Circuit advertisement
    Join Date
    Always
    Posts
    Many

     
  3. #2
    Join Date
    Jul 2006
    Posts
    16,494
    Blog Entries
    75
    Rep Power
    143

    Re: need to convert html file to excel, what is the best way?

    A batch file that renames them from XXX.html to XXX.xls would be my first inclination.
    Programming is a branch of mathematics.
    My CodeCall Blog | My Personal Blog

  4. #3
    fftw_ayi is offline Newbie
    Join Date
    Feb 2009
    Posts
    3
    Rep Power
    0

    Re: need to convert html file to excel, what is the best way?

    Quote Originally Posted by WingedPanther View Post
    A batch file that renames them from XXX.html to XXX.xls would be my first inclination.
    Simply renaming it does not make it an excel file. If I could do this manually, I would have to open the html file in excel and do a save as while explicitly choosing .xls for the Save as type.

  5. #4
    Join Date
    Jul 2006
    Posts
    16,494
    Blog Entries
    75
    Rep Power
    143

    Re: need to convert html file to excel, what is the best way?

    Are you trying to convert it to a binary Excel file, or just get it to open in Excel? Excel will open and render HTML as a spreadsheet (this trick is done a LOT by web apps that need to serve up Excel reports).
    Programming is a branch of mathematics.
    My CodeCall Blog | My Personal Blog

  6. #5
    fftw_ayi is offline Newbie
    Join Date
    Feb 2009
    Posts
    3
    Rep Power
    0

    Re: need to convert html file to excel, what is the best way?

    Quote Originally Posted by WingedPanther View Post
    Are you trying to convert it to a binary Excel file, or just get it to open in Excel? Excel will open and render HTML as a spreadsheet (this trick is done a LOT by web apps that need to serve up Excel reports).
    You are correct. I need it to be a binary Excel file. If I simply rename it, I can double click it and the file will open in excel, but it is not an excel file.

  7. #6
    Join Date
    Jul 2006
    Posts
    16,494
    Blog Entries
    75
    Rep Power
    143

    Re: need to convert html file to excel, what is the best way?

    You could create a utility with Java using the POI utility to convert the HTML to native Excel. You could also do something similar with a .NET utility. Getting it to run on a schedule will depend somewhat on the OS.
    Programming is a branch of mathematics.
    My CodeCall Blog | My Personal Blog

  8. #7
    JenniC is offline Newbie
    Join Date
    Mar 2009
    Posts
    6
    Rep Power
    0

    Re: need to convert html file to excel, what is the best way?

    You have an html file in a local folder. If you are trying to convert a <table> from this file to a spreadsheet file, use biterscripting.

    Read the into a str variable $html.

    var str html ; cat "C:/somefolder/somefile.html" > $html
    $html now has a table starting at <table...> ending at </table>. ( If this file has more than one <table>, see later.)

    Collect rows one by one into a str variable $row

    var str rows
    while ( { sen -r -c "^<tr&</tr\>^" $html } > 0 )
    do
    stex -r -c "^<tr&</tr\>^" $html >> $rows
    echo "\n" >> $rows # End of row
    done
    $rows now has all the rows separated by newlines.

    Collect columns one by one into a str variable $columns.
    Note, this will contain all rows also - we are just inserting a comma
    after each column within each row. We can do this all at once for all
    rows and all columns.

    var str columns
    while ( { sen -r -c "^<td&</td\>^" $rows } > 0 )
    do
    stex -r -c "^<td&</td\>^" $rows >> $columns
    echo "," >> $columns # End of column
    done
    $columns now has all rows separated by newline, all columns within each
    row separated by commas.

    $columns still has html tags. Remove them. biterscripting has a sample script for this SS_WebPageToText.

    echo $columns > "C:/intermediatefile.txt"
    script "C:/Scripts/SS_WebPageToText.txt" page("C:/intermediatefile.txt") > "C:/table.csv"
    C:/table.csv now has a CSV (comma separated values) file, which can be opened in any spreadsheet program.

    You say, you have hundreds of files. Put the above code into a script and pass an input argument $file . (The command cat "C:/somefolder/somefile.html" will become cat $file in the script.) Pass each file one by one using the following:

    var str filelist
    lf -rn "*.html" "C:/somefolder" > $filelist
    while ($filelist <> "")
    do
    var str file ; lex "1" $filelist
    # Call your script with $file here.
    done
    If a $file will contain more than one <table>, and you want to extract, say, the second <table>, extract the second table using the following.

    cat $file > $html
    # Throw away everything before the second instance of <table .
    stex -c "]^<table^2" $html > null
    # Throw away everything after the immediate next instance of </table>.
    stex -c "^</table>^[" $html > null
    $html is now ready to do the rest of the processing above.

    Get biterscripting if you don't have it, from biterscripting.com . I think it is still free.

    J

Closed Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. How to convert HTML to TIFF in C#
    By Johan1974 in forum C# Programming
    Replies: 3
    Last Post: 09-23-2011, 05:17 AM
  2. Html to excel converter
    By venky949 in forum Request Services
    Replies: 2
    Last Post: 01-25-2011, 09:37 PM
  3. HTML Section to Excel Script
    By DWk in forum PHP Development
    Replies: 6
    Last Post: 02-05-2010, 05:13 AM
  4. Convert a generic language expression to excel formula?
    By arunsinbox in forum General Programming
    Replies: 3
    Last Post: 09-24-2008, 08:33 PM
  5. convert frx report to excel file using visual foxpro 8
    By jov in forum General Programming
    Replies: 0
    Last Post: 11-13-2007, 06:43 PM

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts