Jump to content

Regular Expression Help needed

- - - - -

This topic has been archived. This means that you cannot reply to this topic.
7 replies to this topic

#1
Vswe

Vswe

    Writes binary right handed and hex left handed

  • Members
  • PipPipPipPipPipPipPipPipPip
  • 9,552 posts
I'm currently coding a code highlighter for VB.NET using regular expressions. It changes the text to 3 different colors(or keep it black otherwise) depending on 3 regular expressions:


Some Keywords(Blue)
Expression used: \W(?<match>Me|Class|Structure|End)\W

Me, Class, Structure and End are just four of the keywords, in the real expression all keywords are included. The \W is since I only want to find the exact match of the words. Not a match in Meters for example.


Comments(Green)
Expression used: '.*

A VB.NET comment starts with a single quote and continues to the end of the line whatsoever.


Strings(Red)
Expression used: ".*"

A VB.NET string is enclosed in double quotes



The problems I'm having:

Two things in this isn't working though and I hope someone could help me fix the regular expressions so it will work properly. The two things is:

1) Since my first regular expression is searching for one non word character on each side of the keyword it won't find both Class and End in the following example:

End Class

The reason is because there's only one Non-Word character between the keyword, they want one each. And if I only search for a non-word character before/after the keyword(instead of both) it won't solve the actual problem(only before: meters will match Me. Only after: vend will match End).


2)Regular expression #2 and #3 works fine on their own but together a problem will occur. Consider this line:

MessageBox.Show("This is a single quote ' and will cause a problem", myCaption, MessageBoxButtons.OK)

The regular expression #2 and #3 will get a match each and it will result in this highlighting:

MessageBox.Show([COLOR="red"]"This is a single quote ' and will cause a problem"[/COLOR][COLOR="green"],myCaption, MessageBoxButtons.OK)[/COLOR]

The reason is that regular expression #3 finds the string and paints it red(as it should), regular expression #2 finds the single quote and paints everything after it green(as it usually should, but now it shouldn't since it's not a comment when it's a part of the string). Since expression #3 is painting after #2 the red will cover the green inside the string and the result will be that all text after the string will still be green.



Any ideas on how to solve these two problems. Thanks in advance.

/Vswe :)

#2
WingedPanther

WingedPanther

    A spammer's worst nightmare

  • Moderators
  • 16,831 posts
Does your regex engine support backreferences?

I'm vaguely recalling that some flavors support non-capturing lookbacks (don't recall the exact terminology)
Programming is a branch of mathematics.
My CodeCall Blog | My Personal Blog

#3
Vswe

Vswe

    Writes binary right handed and hex left handed

  • Members
  • PipPipPipPipPipPipPipPipPip
  • 9,552 posts
No, the only thing I can set expect the regular expression and the text to use it on is a start index for the search.

#4
WingedPanther

WingedPanther

    A spammer's worst nightmare

  • Moderators
  • 16,831 posts
It occurs to me that you may be approaching it wrong. how are you using the regex's?
Programming is a branch of mathematics.
My CodeCall Blog | My Personal Blog

#5
Vswe

Vswe

    Writes binary right handed and hex left handed

  • Members
  • PipPipPipPipPipPipPipPipPip
  • 9,552 posts
    Private Sub ChangeStyle(ByRef content As String, ByVal Regular_Expression As String, ByVal newColor As Integer, Optional ByVal useMatch As Boolean = False)
        Dim RegexObject As New Regex(Regular_Expression, RegexOptions.IgnoreCase)
        Dim RegexMatch As MatchCollection = RegexObject.Matches(content)

        Dim Match As Match
        Dim Index, Length As Integer
        For i As Integer = RegexMatch.Count - 1 To 0 Step -1
            Match = RegexMatch(i)
            If useMatch Then
                Index = Match.Groups("match").Index
                Length = Match.Groups("match").Length
            Else
                Index = Match.Index
                Length = Match.Length
            End If


            content = content.Insert(Index + Length, "\cf0 ")
            content = content.Insert(Index, "\cf" & newColor)
        Next

    End Sub

The two last rows in the loops adds the start and the end tag for the Richtextbox color change.

#6
bobdark

bobdark

    Programmer

  • Members
  • PipPipPipPip
  • 164 posts
First I am familiar with Flex's way of writing regular expressions, so please forgive for not writing actual expressions and I am also not that familiar with C# so the same goes here..
Anyway, regarding the first problem how about defining something like this: r= ((Me|Class|Structure|End)(\ |\t)+)*(Me|Class|Structure|End)
by [\ ], I mean the space character - the whole expressions should mean one of your words, followed by one or more whitespace characters, zero or more times and again one of the words in the end. Seems to me it should solve your problem.

Regarding the second matter, let me get this straight - do you search first for everything in text that matches the first reg. exp., then the same goes for second and third expressions?
If that's the case, I think this approach is wrong and what you should do is iterate only once over the content and apply to each token only one rule.

#7
WingedPanther

WingedPanther

    A spammer's worst nightmare

  • Moderators
  • 16,831 posts
You may be able to leverage that. Make sure you have a \cf0 at the beginning of your string, then you can have your regex match something like the following:
(\\cf0[^\\]*)('.*)
you could replace that with
$1&"\cf"&newcolor&$2

Granted, I'm viewing this from a non-VB regex perspective (I mainly work in jEdit), but something along those lines may work.
Programming is a branch of mathematics.
My CodeCall Blog | My Personal Blog

#8
Vswe

Vswe

    Writes binary right handed and hex left handed

  • Members
  • PipPipPipPipPipPipPipPipPip
  • 9,552 posts

bobdark said:

First I am familiar with Flex's way of writing regular expressions, so please forgive for not writing actual expressions and I am also not that familiar with C# so the same goes here..
Anyway, regarding the first problem how about defining something like this: r= ((Me|Class|Structure|End)(\ |\t)+)*(Me|Class|Structure|End)
by [\ ], I mean the space character - the whole expressions should mean one of your words, followed by one or more whitespace characters, zero or more times and again one of the words in the end. Seems to me it should solve your problem.

Regarding the second matter, let me get this straight - do you search first for everything in text that matches the first reg. exp., then the same goes for second and third expressions?
If that's the case, I think this approach is wrong and what you should do is iterate only once over the content and apply to each token only one rule.

That won't work since it will match the all the following words as a single match(if they come right after each other) and therefore the character in between the keywords will also be painted(this doesn't have to be a space).

WingedPanther said:

You may be able to leverage that. Make sure you have a \cf0 at the beginning of your string, then you can have your regex match something like the following:
(\\cf0[^\\]*)('.*)
you could replace that with
$1&"\cf"&newcolor&$2

Granted, I'm viewing this from a non-VB regex perspective (I mainly work in jEdit), but something along those lines may work.

That might work, thanks. I'll give it a try.