Closed Thread
Page 1 of 2 12 LastLast
Results 1 to 10 of 14

Thread: Regex Expressions

  1. #1
    John_L is offline Newbie
    Join Date
    May 2008
    Posts
    10
    Rep Power
    0

    Regex Expressions

    Just started using regex expressions but i'm stumped when it comes to the following. I'm parsing data and I want to strip away the following tag found in xml files but leave the text it surrounds intact.

    <![CDATA[ text ]]>

    It's the text inside I would like to keep. I tried searching the net for the proper way but all i've come up with is a way to strip the tag along with what is contained inside it. If I break up my single expression into two and looking it as two regex substitution calls, I get compile time errors...Thanks in advance.

    what I have now, that strips away the tag and what's inside it...

    $string =~ s/<![CDATA[]]//i; #substitute it with nothing (strip it away)

  2. CODECALL Circuit advertisement
    Join Date
    Always
    Posts
    Many

     
  3. #2
    Join Date
    Jul 2006
    Posts
    16,466
    Blog Entries
    74
    Rep Power
    143

    Re: Regex Expressions

    The way I would approach this is as follows:
    There are three sections of your code: "<![CDATA[ ", text, and " ]]>"
    If you wrap the middle in parenthesis, as (.*), you can return it using $1 (or similar)
    Programming is a branch of mathematics.
    My CodeCall Blog | My Personal Blog

  4. #3
    John_L is offline Newbie
    Join Date
    May 2008
    Posts
    10
    Rep Power
    0

    Re: Regex Expressions

    won't that only work if i'm trying to find a match? I can't do that with the substitution expression can I? What if I have multiple CDATA tags to worry about? I'll try playing around with it but i'm not entirely sure I will be writing this out correctly.

  5. #4
    KevinADC is offline Programmer
    Join Date
    Jan 2007
    Posts
    125
    Rep Power
    0

    Re: Regex Expressions

    $string =~ s/<!\[CDATA\[(.*?)\]\]>/$1/gis;

  6. #5
    John_L is offline Newbie
    Join Date
    May 2008
    Posts
    10
    Rep Power
    0

    Re: Regex Expressions

    I tried what you suggested, but it still seems to just remove the tag along with what's inside it. Are you sure that's what it's supposed to be?

  7. #6
    KevinADC is offline Programmer
    Join Date
    Jan 2007
    Posts
    125
    Rep Power
    0

    Re: Regex Expressions

    The regexp works.

    Code:
    $string = '<![CDATA[this is a test]]>';
    $string =~ s/<!\[CDATA\[(.*?)\]\]>/$1/gis;
    print $string;
    The problem is more than likely because your pattern is spread over multiple lines. Post a sample of the real data.

  8. #7
    John_L is offline Newbie
    Join Date
    May 2008
    Posts
    10
    Rep Power
    0

    Re: Regex Expressions

    your right, I got it working correctly. Thanks! Can that same code segment for the substitution be used on lets say a string with quotation marks? Cause I would like to just use it again but for a string that has something like

    <some element url = "www.gosomewhere.com" type = "some type" />

    and use the expression you gave me to grab the url between the first set of parenthesis. Would I have to encode the quotation marks though?

    $string =~ s/"(.*?)"/$1/is; #wouldn't work like this, would it?

  9. #8
    KevinADC is offline Programmer
    Join Date
    Jan 2007
    Posts
    125
    Rep Power
    0

    Re: Regex Expressions

    try it and see.

  10. #9
    John_L is offline Newbie
    Join Date
    May 2008
    Posts
    10
    Rep Power
    0

    Re: Regex Expressions

    nah, doesn't seem to work. I thought putting a couple slashes in '\' would let me include the quotation marks, but that doesn't seem to work either. Should I be using some kind of special characters instead of quotation marks?

  11. #10
    KevinADC is offline Programmer
    Join Date
    Jan 2007
    Posts
    125
    Rep Power
    0

    Re: Regex Expressions

    unless you want to modify the string there is no need to use a s/// regexp, just use m// to find the pattern and assign it to $1 or to a scalar:

    Code:
    $string = '<some element url = "www.gosomewhere.com" type = "some type" />';
    $string =~ /"(.*?)"/;
    print  $1;
    Code:
    $string = '<some element url = "www.gosomewhere.com" type = "some type" />';
    ($domain) = $string =~ /"(.*?)"/;
    print  $domain;
    The parenthesis are important in the above code.

Closed Thread
Page 1 of 2 12 LastLast

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. Regex matching
    By ch3etah in forum Java Help
    Replies: 2
    Last Post: 06-01-2011, 06:34 AM
  2. Need some help with Regex
    By Edvinas in forum C# Programming
    Replies: 3
    Last Post: 06-25-2010, 12:12 PM
  3. Html Regex Help
    By amitUser in forum C# Programming
    Replies: 1
    Last Post: 01-27-2010, 12:11 AM
  4. RegEx in C++
    By BlaineSch in forum C and C++
    Replies: 2
    Last Post: 11-14-2009, 12:02 AM
  5. Tutorial: C# Regex
    By NeedHelp in forum C# Programming
    Replies: 0
    Last Post: 06-28-2006, 09:27 AM

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts