Just started using regex expressions but i'm stumped when it comes to the following. I'm parsing data and I want to strip away the following tag found in xml files but leave the text it surrounds intact.
<![CDATA[ text ]]>
It's the text inside I would like to keep. I tried searching the net for the proper way but all i've come up with is a way to strip the tag along with what is contained inside it. If I break up my single expression into two and looking it as two regex substitution calls, I get compile time errors...Thanks in advance.
what I have now, that strips away the tag and what's inside it...
$string =~ s/<![CDATA[]]//i; #substitute it with nothing (strip it away)
The way I would approach this is as follows:
There are three sections of your code: "<![CDATA[ ", text, and " ]]>"
If you wrap the middle in parenthesis, as (.*), you can return it using $1 (or similar)
won't that only work if i'm trying to find a match? I can't do that with the substitution expression can I? What if I have multiple CDATA tags to worry about? I'll try playing around with it but i'm not entirely sure I will be writing this out correctly.
$string =~ s/<!\[CDATA\[(.*?)\]\]>/$1/gis;
I tried what you suggested, but it still seems to just remove the tag along with what's inside it. Are you sure that's what it's supposed to be?
The regexp works.
The problem is more than likely because your pattern is spread over multiple lines. Post a sample of the real data.Code:$string = '<![CDATA[this is a test]]>'; $string =~ s/<!\[CDATA\[(.*?)\]\]>/$1/gis; print $string;
your right, I got it working correctly. Thanks! Can that same code segment for the substitution be used on lets say a string with quotation marks? Cause I would like to just use it again but for a string that has something like
<some element url = "www.gosomewhere.com" type = "some type" />
and use the expression you gave me to grab the url between the first set of parenthesis. Would I have to encode the quotation marks though?
$string =~ s/"(.*?)"/$1/is; #wouldn't work like this, would it?
try it and see.
nah, doesn't seem to work. I thought putting a couple slashes in '\' would let me include the quotation marks, but that doesn't seem to work either. Should I be using some kind of special characters instead of quotation marks?
unless you want to modify the string there is no need to use a s/// regexp, just use m// to find the pattern and assign it to $1 or to a scalar:
Code:$string = '<some element url = "www.gosomewhere.com" type = "some type" />'; $string =~ /"(.*?)"/; print $1;The parenthesis are important in the above code.Code:$string = '<some element url = "www.gosomewhere.com" type = "some type" />'; ($domain) = $string =~ /"(.*?)"/; print $domain;
There are currently 1 users browsing this thread. (0 members and 1 guests)
Bookmarks