Jump to content

Check out our Community Blogs

Register and join over 40,000 other developers!

Recent Status Updates

View All Updates

- - - - -

Simple LWP'ing and parsing

  • Please log in to reply
1 reply to this topic

#1 Mentalbox


    CC Newcomer

  • Member
  • PipPip
  • 18 posts

Posted 24 June 2011 - 07:17 PM


Temporary goal: being able to "perl script.pl ARGV" and script downloads website-content (HTML) specified in script where ARGV represents query in website (i.e. the query for Google would be: search?q=ARGV), and filters out/only print out some text-parts from the website I want.

Actual work/website/template chosen:
URL: Mestera
(1 of 3) Part(s) with crucial information (HTML): <tr><td>Attack: </td><td><span style='font-weight: small; color: yellow;'>16</span></td></tr>
Wanted text printed from the above: Attack: 16

First I tried Regex. That's worked for me earlier with LWP.
Failed. I forgot why, I just gave up, didn't work for me.
So I Google some and figure HTML::TreeBuilder sounds fair enough & possibly more suitable.
I use code from demonstrations I understand and agree with =_=
But doesn't work.
Now, the current code is:
use LWP;
use HTML::TreeBuilder;

my $tree = HTML::TreeBuilder->new;
my $ua = LWP::UserAgent->new;
my $request = $ua->get("http://www.mestera.net/wikia_view_item_info.php?name=$ARGV[0]");
my $response = $request->decoded_content;
my $response = $tree->parse($response);
my @info = $tree->look_down('_tag', 'span style' => 'font-weight');
for my $info (@info) {
 print $info->as_trimmed_text();
This line:
my @info = $tree->look_down('_tag', 'span style' => 'font-weight');
gives me:

param list to look_down ends in a key! at lwp2.pl line 9

If it instead sound:
my @info = $tree->look_down('span style' => 'font-weight');
I'm returned nothing (blank result, returned to terminal prompt)
And I tried more variations, like: look_down('_tag', span style => 'font_weight');
And more... and more...

Suggestions/solution anyone?
And better yet, how do I solve these things myself? I've honestly been sitting pondering this one for hours already, Googling and quickly-read through about 20-35 articles on the subject, with no solution. Maybe I should've read more about HTML Element, but =_= I think it's really quite silly how I have to go through all of this for simply getting the information between these <span></span> tags.
Jesus. I love Perl for making easy things easy, when they indeed do so.
I hate all programming for making anything hard; or worse yet - and the real problem being, making anything impossible to understand with mere logic.
I wish EVERYTHING was dictated by common sense, only on different levels of difficulty. Never "senseless, but works" - that's so silly to me..
*Want some frustration off my back :@*

Edit: solved. I guess I was (too) tired and frustrated ;)
This code got the values I wanted:
use LWP;
use HTML::TreeBuilder;

my $tree = HTML::TreeBuilder->new;
my $ua = LWP::UserAgent->new;
my $request = $ua->get("http://www.mestera.net/wikia_view_item_info.php?name=$ARGV[0]");
my $response = $request->decoded_content;
my $response = $tree->parse($response);
my @info = $tree->look_down('_tag', 'span',
sub { $_[0]->attr('style') =~ /font-weight: small;/ }
for my $info (@info) {
 print $info->as_trimmed_text();
Dunno what $_[0] means, or if/how it's different from just _$ -- I'll probably search for this soon enough and find out...
Also: this isn't exactly a very practical solution.
What do I do now if I want two tags instead..? two subs? two seperate look_down's? same sub separated by comma or/and newline?

Edit2: I see I'm lucky in this case and can just do:
my @info = $tree->look_down(
'_tag', 'table',
sub { $_[0]->attr('width') eq '40%' }
But if anyone reading this post/thread and thinking "how can I help this guy?", I still wanna know 1) how I'm supposed to figure stuff like this out (but a case that is harder to figure out/find information about) in the future, except for excessive Googling, only hoping that SOME newb like myself has issued my problem -- cus all of these methods (like entirely how the look_down works) isn't mentioned @ cpan.org documentation, and 2) how would I solve this if I hadn't just been lucky that the table with a unique 40% width had exactly, and only exactly the information I was looking for?
And with use strict&warnings, I get this warning 6x:

Use of uninitialized value in string eq at lwp2.pl line 13.

I feel like I don't stand a chance @ programming:D

Edited by Mentalbox, 25 June 2011 - 04:36 PM.
specifically, but not generally solved

  • 0

#2 debtboy


    CC Devotee

  • Just Joined
  • PipPipPipPipPipPip
  • 499 posts

Posted 24 July 2011 - 04:35 PM

Hi Mentalbox,

All your answers can be found in "The Camel Book" (Programming Perl)!!

It may take some time indexing and reading through it, but the rewards are worth it.
IMHO, there is no better language for extracting and manipulating data. ;)

Perl makes easy things easy and difficult things possible

Hint: check out the man pages for perl (it's a book in itself)
  • 0

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download