Jump to content

HTML formatter / crawler

- - - - -

This topic has been archived. This means that you cannot reply to this topic.
2 replies to this topic

#1
jerro

jerro

    Newbie

  • Members
  • Pip
  • 1 posts
Hello,

For a project I have, I need to crawl a certain web page and then extract information from it. The web page is of course in HTML format. Does anyone know good tools in which I can specify something like "HTML regular expressions" or any other specification that can help me extract information from HTML web pages? So for example, I can tell that tool that everything which is between <div class=product_price>xxx</div> will be rewritten in the XML as <price>xxx</price> under a certain position in the XML.

Thanks.

#2
WingedPanther

WingedPanther

    A spammer's worst nightmare

  • Moderators
  • 16,831 posts
Regular expressions can be used to extract information from an HTML document, as can a text processor. It depends a LOT on what exactly you're trying to do, what languages you're familiar with already, and how automated it has to be.
Programming is a branch of mathematics.
My CodeCall Blog | My Personal Blog

#3
TkTech

TkTech

    The Crazy One

  • Moderators
  • 1,396 posts
Lookup the PHP DOM project.