+ Reply to Thread
Page 1 of 2
1 2 LastLast
Results 1 to 10 of 14

Thread: Regex Expressions

  1. #1
    Newbie John_L is an unknown quantity at this point
    Join Date
    May 2008
    Posts
    10

    Regex Expressions

    Just started using regex expressions but i'm stumped when it comes to the following. I'm parsing data and I want to strip away the following tag found in xml files but leave the text it surrounds intact.

    <![CDATA[ text ]]>

    It's the text inside I would like to keep. I tried searching the net for the proper way but all i've come up with is a way to strip the tag along with what is contained inside it. If I break up my single expression into two and looking it as two regex substitution calls, I get compile time errors...Thanks in advance.

    what I have now, that strips away the tag and what's inside it...

    $string =~ s/<![CDATA[]]//i; #substitute it with nothing (strip it away)

  2. #2
    Super Moderator WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther's Avatar
    Join Date
    Jul 2006
    Age
    36
    Posts
    11,680
    Blog Entries
    57

    Re: Regex Expressions

    The way I would approach this is as follows:
    There are three sections of your code: "<![CDATA[ ", text, and " ]]>"
    If you wrap the middle in parenthesis, as (.*), you can return it using $1 (or similar)
    CodeCall Blog | CodeCall Wiki | Shareware
    Programming is a branch of mathematics.
    My CodeCall Blog | My Personal Blog

  3. #3
    Newbie John_L is an unknown quantity at this point
    Join Date
    May 2008
    Posts
    10

    Re: Regex Expressions

    won't that only work if i'm trying to find a match? I can't do that with the substitution expression can I? What if I have multiple CDATA tags to worry about? I'll try playing around with it but i'm not entirely sure I will be writing this out correctly.

  4. #4
    Programmer KevinADC is an unknown quantity at this point
    Join Date
    Jan 2007
    Posts
    125

    Re: Regex Expressions

    $string =~ s/<!\[CDATA\[(.*?)\]\]>/$1/gis;

  5. #5
    Newbie John_L is an unknown quantity at this point
    Join Date
    May 2008
    Posts
    10

    Re: Regex Expressions

    I tried what you suggested, but it still seems to just remove the tag along with what's inside it. Are you sure that's what it's supposed to be?

  6. #6
    Programmer KevinADC is an unknown quantity at this point
    Join Date
    Jan 2007
    Posts
    125

    Re: Regex Expressions

    The regexp works.

    Code:
    $string = '<![CDATA[this is a test]]>';
    $string =~ s/<!\[CDATA\[(.*?)\]\]>/$1/gis;
    print $string;
    The problem is more than likely because your pattern is spread over multiple lines. Post a sample of the real data.

  7. #7
    Newbie John_L is an unknown quantity at this point
    Join Date
    May 2008
    Posts
    10

    Re: Regex Expressions

    your right, I got it working correctly. Thanks! Can that same code segment for the substitution be used on lets say a string with quotation marks? Cause I would like to just use it again but for a string that has something like

    <some element url = "www.gosomewhere.com" type = "some type" />

    and use the expression you gave me to grab the url between the first set of parenthesis. Would I have to encode the quotation marks though?

    $string =~ s/"(.*?)"/$1/is; #wouldn't work like this, would it?

  8. #8
    Programmer KevinADC is an unknown quantity at this point
    Join Date
    Jan 2007
    Posts
    125

    Re: Regex Expressions

    try it and see.

  9. #9
    Newbie John_L is an unknown quantity at this point
    Join Date
    May 2008
    Posts
    10

    Re: Regex Expressions

    nah, doesn't seem to work. I thought putting a couple slashes in '\' would let me include the quotation marks, but that doesn't seem to work either. Should I be using some kind of special characters instead of quotation marks?

  10. #10
    Programmer KevinADC is an unknown quantity at this point
    Join Date
    Jan 2007
    Posts
    125

    Re: Regex Expressions

    unless you want to modify the string there is no need to use a s/// regexp, just use m// to find the pattern and assign it to $1 or to a scalar:

    Code:
    $string = '<some element url = "www.gosomewhere.com" type = "some type" />';
    $string =~ /"(.*?)"/;
    print  $1;
    Code:
    $string = '<some element url = "www.gosomewhere.com" type = "some type" />';
    ($domain) = $string =~ /"(.*?)"/;
    print  $domain;
    The parenthesis are important in the above code.

+ Reply to Thread
Page 1 of 2
1 2 LastLast

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

     

Similar Threads

  1. Tutorial: C# Regex
    By NeedHelp in forum CSharp Tutorials
    Replies: 2
    Last Post: 05-01-2007, 06:44 AM
  2. regex (yuck)
    By John in forum PHP Forum
    Replies: 9
    Last Post: 08-09-2006, 12:35 AM
  3. Boolean expressions
    By Sionofdarkness in forum Java Help
    Replies: 11
    Last Post: 08-02-2006, 07:25 PM
  4. Arithmetic Expressions
    By Sionofdarkness in forum Java Help
    Replies: 2
    Last Post: 07-29-2006, 12:23 PM
  5. Regular expressions
    By Nightracer in forum General Programming
    Replies: 6
    Last Post: 07-24-2006, 09:57 PM

Bookmarks

Bookmarks

     
        Algorithms and Data Structures

        Java tutorials

        Algorithms Forum

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts