Jump to content

How to strip HTML tags, scripts, and styles from a web page

- - - - -

This topic has been archived. This means that you cannot reply to this topic.
10 replies to this topic

#1
sergo

sergo

    Newbie

  • Members
  • Pip
  • 7 posts
I found this script hxxp://nadeausoftware.com/articles/2007/09/php_tip_how_strip_html_tags_web_page that should strip HTML tags, scripts, and styles from a web page but it doesn't.

What is the problem?

#2
Jacki

Jacki

    Learning Programmer

  • Members
  • PipPipPip
  • 80 posts
I suggest you to use the strip_tags() function from the php library. That's the easiest way. Bye!
Posted Image

Posted Image

#3
sergo

sergo

    Newbie

  • Members
  • Pip
  • 7 posts
strip_tags() function is useless

#4
zeroradius

zeroradius

    Speaks fluent binary

  • Members
  • PipPipPipPipPipPipPipPip
  • 1,406 posts
no its not. It works perfectly.
Posted Image

#5
Guest_Jordan_*

Guest_Jordan_*
  • Guests
Why do you say the strip_tags function is useless? It will do exactly what you stated that you need above.

#6
sergo

sergo

    Newbie

  • Members
  • Pip
  • 7 posts
I have tried it and it doesn't work, it returns the tags, not the text without tags

#7
Jacki

Jacki

    Learning Programmer

  • Members
  • PipPipPip
  • 80 posts

sergo said:

I have tried it and it doesn't work, it returns the tags, not the text without tags
It works! Post the code where you've used strip_tags()..
<?php

$text = '<p>Test paragraph.</p><!-- Comment --> <a href="#fragment">Other text</a>';

echo strip_tags($text);

echo "\n";


// Allow <p> and <a>

echo strip_tags($text, '<p><a>');

?>


Posted Image

Posted Image

#8
chili5

chili5

    Writes binary right handed and hex left handed

  • Members
  • PipPipPipPipPipPipPipPipPip
  • 7,247 posts

sergo said:

I have tried it and it doesn't work, it returns the tags, not the text without tags


It works just fine. :confused: Post your code. You are probably using it wrong. Either that or you don't understand the function correctly. It is working but you think it isn't.

#9
sergo

sergo

    Newbie

  • Members
  • Pip
  • 7 posts
How do I use this to remove something like this ("something"); or Blaf Blab ?

<?php

$text = '<p>Test paragraph.</p><!-- Comment --> <a href="#fragment">Other text</a>';

echo strip_tags($text);

echo "\n";


// Allow <p> and <a>

echo strip_tags($text, '<p><a>');

?> 


#10
Jacki

Jacki

    Learning Programmer

  • Members
  • PipPipPip
  • 80 posts

sergo said:

How do I use this to remove something like this ("something"); or Blaf Blab ?

<?php
$text = '<p>Test paragraph.</p><!-- Comment --> <a href="#fragment">Other text</a>';
echo strip_tags($text);
echo "\n";

// Allow <p> and <a>
echo strip_tags($text, '<p><a>');
?> 
strip_tags is only for strips html tag, if you want to strip one other thing you have to use an other function, for example you can do:
$string = str_replace("something", "", $string);

Posted Image

Posted Image

#11
Guest_Jordan_*

Guest_Jordan_*
  • Guests
Try this, I found it on the PHP manual via user submitted comments:
<?php 
function html2txt($document){ 
$search = array('@<script[^>]*?>.*?</script>@si',  // Strip out javascript 
               '@<[\/\!]*?[^<>]*?>@si',            // Strip out HTML tags 
               '@<style[^>]*?>.*?</style>@siU',    // Strip style tags properly 
               '@<![\s\S]*?--[ \t\n\r]*>@'         // Strip multi-line comments including CDATA 
); 
$text = preg_replace($search, '', $document); 
return $text; 
} 
?>