I am writing a perl program which should do the following...
for ex. if I have a html file like..
<b>this is bold.</b>This is
bold too</b>
I have to write the program (without using any html parser function) that would print it like.....
<b>this is bold.This is bold too</b>
basically it would remove unnecarry tags.
I just have to use regular expressions for it.
My instructor advised me not to read the html file line by line as it would not take care of if a tags have beginning tags in on line 1 and the end tag is on the line after (as seen in the file above). I was suggested to put all the html file into one scalar variable.
Now I have made the program so it puts all the html file in one scalar variable. Now my question is how would I search for several instances of <b> and </b> tags in the scalar variable. Should I read it character by character? I am very consfused on this part. Please advise me. Thanks!
what have you tried so far to figure out the problem?
Hi,
so far i have am able to remove the bold tags as.....
<b>abcd</b>efgh<b>ijkl</b>
to
<b>abcdefghijkl</b>
by using...
$allHtmlDocument =~ s/$endBoldTag(\s*)$startBoldTag//gi;
now the problem is...
if I have <b>abcd</b><i><b>efgh</i></b>
and I want to make it like
<b>abcd<i>efgh</i></b>
then I still need to remove the bold tags (as there are only tags between them) but I also need to keep the tags between them.how would i capture those tags. I am unable to figure out any way since I am not reading the whole document line by line.
I tried using special variables but what if I have other tags (more than one time) between the bold tags.
Thanks!
Last edited by abhisheksainiabhishek; 06-10-2008 at 06:32 PM.
I got it now.....its simple but I couldnt get it because I am very new to perl.
i did it like.....
$allHtmlDocument =~ s/$endBoldTag(<(.*)>)*$startBoldTag/$1/gi;
basically <(.*)> captured all the tags within the bold tags and was printed by sing $1.
thanks anyways
There are currently 1 users browsing this thread. (0 members and 1 guests)
Bookmarks