I'm trying to write some code to extract the content from a given container, (f.e. <div id="content">)
- all the regex solutions I tried, stumbled on the closing </div> elements inside the big container...
I started digging into the php.net/dom approach, but the best I got was just showing me plain text (all the markup was gone)
Here's my sample code:
(actually, I used the front page of the PHP home page for this test)
<?php
$url = 'myfavoritesite.net';
$html = new DOMDocument();
@$html->loadHTMLFile($url);
$result = $html->getElementById('content');
$text=utf8_decode($result->nodeValue);
// output the result
echo "<pre>". $text . "</pre>";
?>
I really hope there's some genius in here who will be able to enlighten me!


Sign In
Create Account

Back to top









