Hello,
For a project I have, I need to crawl a certain web page and then extract information from it. The web page is of course in HTML format. Does anyone know good tools in which I can specify something like "HTML regular expressions" or any other specification that can help me extract information from HTML web pages? So for example, I can tell that tool that everything which is between <div class=product_price>xxx</div> will be rewritten in the XML as <price>xxx</price> under a certain position in the XML.
Thanks.
HTML formatter / crawler
Started by jerro, Mar 09 2009 01:14 PM
2 replies to this topic
#1
Posted 09 March 2009 - 01:14 PM
|
|
|
#2
Posted 09 March 2009 - 01:26 PM
Regular expressions can be used to extract information from an HTML document, as can a text processor. It depends a LOT on what exactly you're trying to do, what languages you're familiar with already, and how automated it has to be.
#3
Posted 09 March 2009 - 05:04 PM
Lookup the PHP DOM project.


Sign In
Create Account

Back to top









