Closed Thread
Results 1 to 5 of 5

Thread: Data extraction.

  1. #1
    joe1986 is offline Newbie
    Join Date
    Sep 2007
    Posts
    5
    Rep Power
    0

    Post Data extraction.

    Hi there, new to Perl scripting, but ive been told its quite straight forward to pick up Anyway, im looking to write a small script that can be implemented in explorer and that can extract specific info off a page and export that info into a word document or excel file. Any pointers that could get me started?
    Cheers
    Joe

  2. CODECALL Circuit advertisement
    Join Date
    Always
    Posts
    Many

     
  3. #2
    KevinADC is offline Programmer
    Join Date
    Jan 2007
    Posts
    125
    Rep Power
    0
    Extract info from a page on the internet? You don't typically use a browser for more than invoking a perl script, this would typically be a CGI script. This forum is an example of a CGI script written using PHP.

    The server runs the code the browser just displays the formatted output from the script.

    Read a good perl tutorial is my best pointer for now:

    Beginning Perl - perl.org

    worry about your specifc program requirements once you understand some basics.

  4. #3
    jonmacpherson is offline Newbie
    Join Date
    Dec 2007
    Posts
    7
    Rep Power
    0
    Well..... heres a real world example of a very similar program, which controls word, and combines articles. Your mileage may vary. You may use all, parts of or none of this script.

    I learned how to write the following program by looking at word controlling programs others had written.


    #!c:\perl56\bin\perl
    #
    # Combine Articles using MS Word, and save them back to their original queue.



    BEGIN {

    use Cwd;
    use CGI::Carp qw(fatalsToBrowser);
    use Win32::OLE;
    use ANPA;



    require "cgi-lib.pl";
    require "Gamma.pm";
    require "Security2.pm";
    require "cn4.lib";
    require "Process.pm";
    require "SubData.pm";
    require "QueueAccess.pm";

    $folder = "o:\\combine\\";
    $sys_Universal_Prefix = "o:\\";

    $InchLengthMacro = "Normal.Module1.GetInches";


    }

    print 'Content-type: text/html', "\n\n";

    $data = new Gamma:rocess();

    my $wrd = CreateObject Win32::OLE "Word.Application" or die $1;
    $wrd->{'Visible'} = 1;

    %lingo = (
    'article.rec_type' => 0, # not used
    'article.category' => 1, #
    'article.date' => 2,
    'article.add_date' => 3,
    'article.add_time' => 4, # Used by auotpurger to delete old articles
    'article.exp_date' => 5,
    'article.owner' => 6,
    'article.active' => 7,
    'article.title' => 8,
    'article.author' => 9,
    'article.image' => 10, # In Stone.
    'article.photocap' => 11,
    'article.template' => 12, #
    'article.priority' => 13,
    'article.intro' => 14,
    'article.story' => 15,
    'article.notes' => 16,
    'article.relevency' => 17
    );


    $data->_fill_lev1( { 'lingo_record' => \%lingo } );





    chdir ($folder);

    opendir (DIR, $folder) || die "cannot open $folder due to $!";

    @LIST = readdir DIR;


    foreach $file (@LIST){

    # Only look for cmb files.
    if ($file =~ m/\.cmb$/ig){
    print $file;
    &combineArticles($file);

    }


    }


    sub combineArticles {

    my ($file) = @_;

    $fullFile = $folder . $file;
    $destQueue = $file;
    $destQueue =~ s#-Q-F-.*##igs;
    $destFile = $file;
    $destFile =~ s#^.*-Q-F-##igs;
    $destFile =~ s#\.cmb$##igs;
    $slug = $destFile;
    $destFile = $sys_Universal_Prefix . $destQueue . "/" . $destFile . ".doc";

    $sys_record = $sys_Universal_Prefix . $destQueue . "/" . "records.gamma";


    print $fullFile;
    open (CMBInstr, "$fullFile")|| die "Cannot open $fullFile due to $!";
    @FILENAmes = <CMBInstr>;
    close CMBInstr;


    my $ToDoc = $wrd->Documents->Add;

    foreach $file (@FILENAmes){

    if ($file =~ m/\w/ig){

    $file = $file . ".doc";
    $file = $sys_Universal_Prefix . $destQueue . "/" . $file;
    $file =~ s#\n##igs;

    my $doc = $wrd->Documents->Open( $file ) || die "Cannot open $file due to $!";
    $doc->Content->Copy;
    print "Copying Contents of $file \n";
    $doc->Close();

    print "Removing Temporary File $file \n";
    #system (" del \"$file\" ");

    print "Pasting Contents of $file into \n \t $destFile \n";
    $wrd->Run('Normal.NewMacros1.PasteText');


    }

    }

    $ToDoc->SaveAs($destFile);

    $wrd->Run($InchLengthMacro);

    $ToDoc->Close();

    system(" del \"$fullFile\" ");

    $t = time();


    my $InchesDataFile = $destFile . ".count";
    open (INCHFILE, $InchesDataFile);
    my (@Counttainer) = <INCHFILE>;
    close INCHFILE;

    unlink($InchesDataFile);

    my $InchLenghtIn = join ('', @Counttainer);
    $InchLenghtIn =~ s#\n##igs;

    {

    # Get the current date, and chop out the parts that are unwanted.
    # I only want the month day of the month and time
    # example: jul 19 14:23:06

    my ($DateString) = "" . localtime(time());

    my ($dweek, $mon, $dmon, $time, $yr) = split (' ', $DateString );

    $DesiredDateString = "$mon $dmon $time";

    }
    $data->read($sys_record, 'record');

    print "Writting changes to $sys_record \n\n";
    print "Slug \t\t $slug \n";
    print "Date \t\t $DesiredDateString \n";
    print "TimeStamp \t $t \n";
    print "Inches \t\t $InchLenghtIn Inches \n";
    print "Queue \t\t $destQueue \n";

    $data->_fill_lev1({'object' => 'record'});
    $data->_fill_lev2('files', { 'record' => $sys_record });
    $data->_fill_lev2('formdata', { 'record_index' => $slug,
    'article.title' => $slug,
    'article.date' => $DesiredDateString,
    'article.add_date' => $t,
    'article.add_time' => $t,
    'article.template' => $destQueue,
    'record.index' => $slug,
    'article.intro' => '',
    'article.notes' => $InchLenghtIn . " Inches",
    'article.owner' => ""
    } );

    $data->_auto_save_any();
    $data->write( $sys_record, 'record');

    }

    $wrd->Quit;

  5. #4
    KevinADC is offline Programmer
    Join Date
    Jan 2007
    Posts
    125
    Rep Power
    0
    jon,

    watch the post dates, this thread is several months old, the OP has posted this one question and never returned. But of course I only offer that as a suggestion, you are free to post replies in any thread you wish to.

    --Kevin

  6. #5
    jonmacpherson is offline Newbie
    Join Date
    Dec 2007
    Posts
    7
    Rep Power
    0

    Cool Thanks

    Thanks Kevin;

    Didn't even notice.

Closed Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. Tv Show / Series extraction tool
    By 7SLEVIN in forum Bash / Shell Scripting
    Replies: 0
    Last Post: 02-10-2011, 03:30 AM
  2. Facial Feature Extraction in C#
    By mkfrns in forum C# Programming
    Replies: 0
    Last Post: 11-04-2010, 07:08 AM
  3. Data Masking helps in reducing data privacy violations
    By tossy in forum Software Security
    Replies: 1
    Last Post: 07-06-2009, 05:52 PM
  4. Help With Image Link Extraction
    By deepnx in forum HTML Programming
    Replies: 3
    Last Post: 02-26-2009, 01:26 PM
  5. Replies: 4
    Last Post: 04-11-2007, 09:41 AM

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts