Jump to content

Regular Expression - Don't Match Sequence of Characters

- - - - -

This topic has been archived. This means that you cannot reply to this topic.
2 replies to this topic

#1
rsnider19

rsnider19

    Learning Programmer

  • Members
  • PipPipPip
  • 34 posts
So I am parsing some html using preg_match_all().

This is my string:
<u>Text</u><br />More <strike>Text</strike><br />Even More Text<br /><br />

So my regex as of right now is as follows:
$regex = "/<u>([\w]+[^<]*)<\/u><br \/>([\w]+[^<]*)<br \/>[\w]+:[\s]*([\w]+[^<]*)<br \/><br \/>/";

I basically want to extract 'Text', 'More <strike>Text</strike>', and 'Even More Text'.
The problem is, right now it won't match because the <strike> throws it off. I am looking for a way to say perhaps this:
<br \/>([\w]+[^(<br \/>)]*)<br \/>

As in, Find a bunch of words that aren't '<br />', followed by <br />. I hope that is clear enough.

#2
rsnider19

rsnider19

    Learning Programmer

  • Members
  • PipPipPip
  • 34 posts
Solved that problem. I discovered what '?' and greedy means. lol. now I have come across another problem: how can you make it match part of the regex multiple times. Say I have:

<u>Text</u><br />More <strike>Text</strike><br />Even More Text<br /><br />Even More Text<br /><br />

So I want to do something like this:

(<br \/>.*?<br \/>)+

In other words, match '<br \/>.*?<br \/>' one or more times.

#3
Alexander

Alexander

    It's Science!

  • Moderators
  • 4,124 posts
There's a much more efficient method called using an XML parser you know, regular expressions on HTML is becomming a cancer.
Be sure to read the updated FAQ! || Health is achieved through the same 10,000 steps.
If a suggested code/method fails, informing us is less important than telling us why or what errors occurred.