Jump to content

I need help with the black magic of regex!

- - - - -

  • Please log in to reply
4 replies to this topic

#1
c1neast

c1neast

    Newbie

  • Members
  • Pip
  • 3 posts
I need to extract a string from another string using regex, and dont come with no substrings cause the solution has to work for different amounts of letters in the string.
This is my string (for example):
-sv"><a href="http://sv.wikipedia.org/wiki/%C3%84pple" title="
now the length of this part of the string and also the rest of the string may vary.
What I want to extract is the link between the quotation marks. I've tried to make a regex but I couldn't make the quotation marks work within the quote, this is what it looks like (I have no idea of what I did there...):
           Regex linkRegex = new Regex(@"sv""><a href=""\s*(?sv""><a href=""[^<]+)\s*"" title""", System.Text.RegularExpressions.RegexOptions.Compiled);


                

            if ( linkRegex.IsMatch(k) ) 

                {

                Match match = linkRegex.Match(k);

                string theLink = match.Groups["link"].Value;

                x = theLink;

                }

Point is: it doesn't work, and I don't see it work anytime soon without anyone who knows this black magic helping me.
Thanks in advance :)

#2
WingedPanther

WingedPanther

    A spammer's worst nightmare

  • Moderators
  • 16,831 posts
  • Location:Upstate, South Carolina
  • Programming Language:C, C++, PL/SQL, Delphi/Object Pascal, Pascal, Transact-SQL, Others
  • Learning:Java, C#, PHP, JavaScript, Lisp, Fortran, Haskell, Others
You'll probably have to escape the quotation marks with something like \"

Also, you need to be aware of the language you're doing this in, as Perl RegEx is slightly different from JavaScript RegEx, is slight different from Java RegEx, is slightly different from ...
Programming is a branch of mathematics.
My CodeCall Blog | My Personal Blog

#3
c1neast

c1neast

    Newbie

  • Members
  • Pip
  • 3 posts
well what I've done so far is more or less a copy-paste, so I would really need someone to write a fitting regex for me since I don't understand it at all.
I'm using C#.
EDIT: I also tried to escape the quotations but it simply doesn't work, and that may be the problem itself.

#4
Alexander

Alexander

    It's Science!

  • Moderators
  • 4,118 posts
  • Location:Vancouver, Eh! Cleverness: 200
The following pattern will extract the HREF attribute, assuming you set IgnoreCase and IgnorePatternWhitespace it will work:
@"]*?HREF\s*=\s*[""']?([^'"" >]+?)[ '""]?[^>]*?>";

Be sure to read the updated FAQ! || Health is achieved through the same 10,000 steps.
If a suggested code/method fails, informing us is less important than telling us why or what errors occurred.

#5
c1neast

c1neast

    Newbie

  • Members
  • Pip
  • 3 posts
Well, it is crucial that I get the sv-part before the href, or it might pick one of the other 500 links with the same surrounding letters. :/




1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users