Lost Password?


Go Back   CodeCall Programming Forum > Web Development Forum > Perl

Perl Discussion for the PERL language - Practical Extraction and Reporting Language, is a programming language often used for creating CGI programs.

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 05-20-2008, 04:35 PM
John_L John_L is offline
Newbie
 
Join Date: May 2008
Posts: 10
Rep Power: 0
John_L is on a distinguished road
Default Regex Expressions

Just started using regex expressions but i'm stumped when it comes to the following. I'm parsing data and I want to strip away the following tag found in xml files but leave the text it surrounds intact.

<![CDATA[ text ]]>

It's the text inside I would like to keep. I tried searching the net for the proper way but all i've come up with is a way to strip the tag along with what is contained inside it. If I break up my single expression into two and looking it as two regex substitution calls, I get compile time errors...Thanks in advance.

what I have now, that strips away the tag and what's inside it...

$string =~ s/<![CDATA[]]//i; #substitute it with nothing (strip it away)
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote

Sponsored Links
  #2 (permalink)  
Old 05-20-2008, 04:50 PM
WingedPanther's Avatar   
WingedPanther WingedPanther is offline
Super Moderator
 
Join Date: Jul 2006
Age: 35
Posts: 3,277
Last Blog:
wxWidgets is NOT code ...
Rep Power: 36
WingedPanther is a name known to allWingedPanther is a name known to allWingedPanther is a name known to allWingedPanther is a name known to allWingedPanther is a name known to allWingedPanther is a name known to all
Default Re: Regex Expressions

The way I would approach this is as follows:
There are three sections of your code: "<![CDATA[ ", text, and " ]]>"
If you wrap the middle in parenthesis, as (.*), you can return it using $1 (or similar)
__________________
CodeCall Blog | CodeCall Wiki | Shareware | Linux Forum
Programming is a branch of mathematics.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 05-20-2008, 05:47 PM
John_L John_L is offline
Newbie
 
Join Date: May 2008
Posts: 10
Rep Power: 0
John_L is on a distinguished road
Default Re: Regex Expressions

won't that only work if i'm trying to find a match? I can't do that with the substitution expression can I? What if I have multiple CDATA tags to worry about? I'll try playing around with it but i'm not entirely sure I will be writing this out correctly.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 05-20-2008, 06:34 PM
KevinADC KevinADC is offline
Learning Programmer
 
Join Date: Jan 2007
Posts: 91
Rep Power: 7
KevinADC is on a distinguished road
Default Re: Regex Expressions

$string =~ s/<!\[CDATA\[(.*?)\]\]>/$1/gis;
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 05-20-2008, 10:08 PM
John_L John_L is offline
Newbie
 
Join Date: May 2008
Posts: 10
Rep Power: 0
John_L is on a distinguished road
Default Re: Regex Expressions

I tried what you suggested, but it still seems to just remove the tag along with what's inside it. Are you sure that's what it's supposed to be?
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote

Sponsored Links
  #6 (permalink)  
Old 05-21-2008, 04:26 AM
KevinADC KevinADC is offline
Learning Programmer
 
Join Date: Jan 2007
Posts: 91
Rep Power: 7
KevinADC is on a distinguished road
Default Re: Regex Expressions

The regexp works.

Code:
$string = '<![CDATA[this is a test]]>';
$string =~ s/<!\[CDATA\[(.*?)\]\]>/$1/gis;
print $string;
The problem is more than likely because your pattern is spread over multiple lines. Post a sample of the real data.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 05-21-2008, 07:34 PM
John_L John_L is offline
Newbie
 
Join Date: May 2008
Posts: 10
Rep Power: 0
John_L is on a distinguished road
Default Re: Regex Expressions

your right, I got it working correctly. Thanks! Can that same code segment for the substitution be used on lets say a string with quotation marks? Cause I would like to just use it again but for a string that has something like

<some element url = "www.gosomewhere.com" type = "some type" />

and use the expression you gave me to grab the url between the first set of parenthesis. Would I have to encode the quotation marks though?

$string =~ s/"(.*?)"/$1/is; #wouldn't work like this, would it?
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 05-22-2008, 03:35 AM
KevinADC KevinADC is offline
Learning Programmer
 
Join Date: Jan 2007
Posts: 91
Rep Power: 7
KevinADC is on a distinguished road
Default Re: Regex Expressions

try it and see.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 05-22-2008, 06:41 PM
John_L John_L is offline
Newbie
 
Join Date: May 2008
Posts: 10
Rep Power: 0
John_L is on a distinguished road
Default Re: Regex Expressions

nah, doesn't seem to work. I thought putting a couple slashes in '\' would let me include the quotation marks, but that doesn't seem to work either. Should I be using some kind of special characters instead of quotation marks?
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 05-22-2008, 06:52 PM
KevinADC KevinADC is offline
Learning Programmer
 
Join Date: Jan 2007
Posts: 91
Rep Power: 7
KevinADC is on a distinguished road
Default Re: Regex Expressions

unless you want to modify the string there is no need to use a s/// regexp, just use m// to find the pattern and assign it to $1 or to a scalar:

Code:
$string = '<some element url = "www.gosomewhere.com" type = "some type" />';
$string =~ /"(.*?)"/;
print  $1;
Code:
$string = '<some element url = "www.gosomewhere.com" type = "some type" />';
($domain) = $string =~ /"(.*?)"/;
print  $domain;
The parenthesis are important in the above code.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote

Sponsored Links
Reply



Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Tutorial: C# Regex NeedHelp CSharp Tutorials 2 05-01-2007 07:44 AM
regex (yuck) John PHP Forum 9 08-09-2006 01:35 AM
Boolean expressions Sionofdarkness Java Help 11 08-02-2006 08:25 PM
Arithmetic Expressions Sionofdarkness Java Help 2 07-29-2006 01:23 PM
Regular expressions Nightracer General Programming 6 07-24-2006 10:57 PM


All times are GMT -5. The time now is 03:50 PM.

Contest Stats

WingedPanther ........ 2753.6
Xav ........ 2704
Brandon W ........ 1702.32
John ........ 1207.73
marwex89 ........ 1175.24
morefood2001 ........ 966.05
dcs ........ 655.75
Steve.L ........ 475.59
orjan ........ 418.58
Aereshaa ........ 383.54

Contest Rules

CodeCall Goal

Goal: 100,000 Posts
Complete: 98%

Ads