Jump to content

C# Regular Expressions

- - - - -

  • Please log in to reply
No replies to this topic

#1
chili5

chili5

    Writes binary right handed and hex left handed

  • Members
  • PipPipPipPipPipPipPipPipPip
  • 7,247 posts
  • Programming Language:Java, C#, PHP
  • Learning:C, C++, C#, PHP, Transact-SQL, Assembly, Scheme
C# Regular Expressions

Regular Expressions are useful for matching strings using patterns. These patterns can be complicated but they provide a really powerful way of manipulating strings. If you are not familiar with the syntax have a look here. Regular expressions are very useful when developing websites as they provide a fairly simple way of validating data the user inputs.

Regex Object

All the regular expression methods in C# are stored in the Regex object in the System.Text.RegularExpressions namespace. So make sure you are using the RegularExpressions namespace with the following code:

 using System.Text.RegularExpressions;
 

Creating a Regex Object


To demonstrate how to create a regex object let us consider the problem of validating a phone number. This isn’t actually easy since a phone number can take many formats. For this example we will simply ensure that phone numbers follow the form XXX-XXX-XXXX where X is a digit from 0 to 9.
Code:

  Regex regex = new Regex("[0-9]{3}-[0-9]{3}-[0-9]{4}");

The Regex constructor takes the pattern that the regex is going to use as its first parameter. A second overload let’s you specify Regex options such as case-insensitivity and multiline which we will look at later.
The pattern [0-9]{3}-[0-9]{3}-[0-9]{4} means to match any 3 digits followed by a hyphen followed by another 3 digits followed by another hyphen and finally followed by 4 more digits. An example string that matches this pattern is 555-555-5555. Some examples of strings that do not match this pattern are 55-555-5555 and AAA-AAA-AAAA.

Matching Strings

The most common task that we will want to perform is given an input string does it match a certain pattern? To accomplish this we simply use the regex.IsMatch(string input) method. This method returns true if input matches the pattern given to the regex object in its constructor and false otherwise.

Example:

  Regex regex = new Regex("[0-9]{3}-[0-9]{3}-[0-9]{4}");
string input = "555-555-5555"; Console.WriteLine(regex.IsMatch(input) ? "match" : "no match");  

The expression regex.IsMatch(input) ? "match" : "no match is a shorthand for if else. If the method produces true then match is produced otherwise no match is produced and that is what gets outputted to the console. This shorthand is a way of writing if structures in what line of code which greatly shortens your code.

This could equivalently be written like this:
 if (regex.IsMatch(input)){  
      Console.WriteLine("match"); 
} else { 
      Console.WriteLine("no match");
} 

The output to the above code when input = "555-555-5555" is:

Quote

match

Try changing input to “AAA-AAA-AAAA” and “3-333-33333”. In both cases the output will be “no match”.
Regex Options

First we will look at the case-insensitive operation. We will extend the phone number regex defined above to allow letters also.
We can create this regex like this:

  Regex regex = new Regex("[0-9A-Z]{3}-[0-9A-Z]{3}-[0-9A-Z]{4}");

Now strings such as “555-555-5555” and “AA5-AAA-3A2B” will match this pattern. If we want to make this pattern case-insensitive we have two options. We can add a-z to all three character classes or we can make use of a regex option.
If we take the option of adding a-z to all three character classes then the regex object becomes this:

  Regex regex = new Regex("[0-9A-Za-z]{3}-[0-9A-Za-z]{3}-[0-9A-Za-z]{4}");
The other slightly cleaner option is to make use of a Regex option which gets passed as the second parameter to the regex constructor.

Example:

  Regex regex = new Regex("[0-9A-Z]{3}-[0-9A-Z]{3}-[0-9A-Z]{4}",RegexOptions.IgnoreCase);

RegexOptions is an enum which defines several options for manipulating a regex. We will shortly have a look at the multiline regex option.

Try this code:
Regex regex = new Regex("[0-9A-Z]{3}-[0-9A-Z]{3}-[0-9A-Z]{4}",RegexOptions.IgnoreCase);
 
string input = "55a-A55-5555";
Console.WriteLine(regex.IsMatch(input) ? "match" : "no match");
 

This will output match since the pattern is case-insensitive it will match any number 0-9 any letter A-Z or a-z.

Multiline option

When looking at the multiline option we get to introduce two new concepts: a multiline string and the Regex.Matches method.
First to create a multiline regex write the Regex constructor like this:

 Regex regex = new Regex("[0-9A-Z]{3}-[0-9A-Z]{3}-[0-9A-Z]{4}",RegexOptions.Multiline);
 

Now to create a multiline string. The syntax is to prefix the string with a @ symbol as follows:

  string s = @"
            555-555-5555
            AAA-AAA-AAAA
            222-222-2222
            This is just a test. 333-333-3332
            ";  

Now what we want to accomplish is to find and output all strings in s that match the pattern. To do this we use the Matches method which returns a MatchCollection storing all matches of the pattern.

Example code:

Regex regex = new Regex("[0-9A-Z]{3}-[0-9A-Z]{3}-[0-9A-Z]{4}",RegexOptions.Multiline);
 
string s = @"
555-555-5555
AAA-AAA-AAAA
222-222-2222  This is just a test. 333-333-3332
";
 
foreach (var match in  regex.Matches(s))
{
       Console.WriteLine(match);
} 
 

When running this code the output is:

Quote

555-555-5555
AAA-AAA-AAAA
222-222-2222
333-333-3332

which is what we expected.
Replacing Strings

Regex is also useful for replacing substrings that match a certain pattern. One application of this method is a bad word filter.
To replace text using Regex we use the Replace method which takes two parameters: the string to search in and the text to replace all matches with. Let us use the above example from multiline option as an example.

  Regex regex = new Regex("[0-9A-Z]{3}-[0-9A-Z]{3}-[0-9A-Z]{4}",RegexOptions.Multiline);
 
string s = @"
            555-555-5555
            AAA-AAA-AAAA
            222-222-2222
            This is just a test. 333-333-3332
            ";
 
s = regex.Replace(s, "XXX-XXX-XXXX");
 
foreach (var match in  regex.Matches(s))
{
    Console.WriteLine(match);
}   

This code is going to search in s for all strings that match the pattern [0-9A-Z]{3}-[0-9A-Z]{3}-[0-9A-Z]{4} and replace them with XXX-XXX-XXXX and produce the result string to be stored in s.
Notice that XXX-XXX-XXXX also matches the pattern so when we run this code the output will be:

Quote

XXX-XXX-XXXX
XXX-XXX-XXXX
XXX-XXX-XXXX
XXX-XXX-XXXX





1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users