Jump to content




Recent Status Updates

  • Photo
      18 Aug
    KodeKool

    When faced with a wall of errors and no hope to fix them, remember the following "Programs always do what you tell them to, and seldom what you want them to, but eventually you'll run out of things that can go wrong and it'll just work. and that's the secret to good programming."

    Show comments (2)
  • Photo
      11 Aug
    Error

    Should I be practicing programming every day? I feel if I don't, I'll get instantly rusty or something.

    Show comments (4)
View All Updates

Developed by Kemal Taskin
Photo
- - - - -

Regular Expressions


  • Please log in to reply
11 replies to this topic

#1 chili5

chili5

    CC Mentor

  • Expert Member
  • PipPipPipPipPipPipPipPip
  • 3,031 posts
  • Programming Language:Java, C#, PHP, JavaScript, Transact-SQL
  • Learning:C, Java, C++, C#, PHP, JavaScript, Transact-SQL, Assembly, Scheme

Posted 27 August 2009 - 02:57 AM

Regular Expressions in Java

Regular expressions provide a really fast and flexible method of matching strings and matching patterns in strings. The String class uses them to perform things like replacing text. These are commonly used by text editors and can make some tasks really easy.

The package that you need to import is java.util.regex.*;. This contains a few classes that you will need. They are Pattern and Matcher.

Regular Expressions

Regular expressions can be a complicated but useful method in manipulating, and validating strings.

First we will look at some simple regular expressions.

[A-Z] 

This is a character class and it will match any thing that is a letter between A and Z. This is the same as

return ch >= 'A' && ch <= 'Z';

Except, it is a string operation and not a character operation.

The above will match:

A, E, I, D, G

It will not match: a, e, i, o, !, ?.

[^A-Z]

This regular expression is used to match any character that is not an upper case letter.

We can use brace brackets to indicate exactly how many times to match and expression.

Example:

[A-Z]{1,}

This expression means that we want to match any uppercase letter at least once.

Matches: A, BC, DEF, GHIKL
No match: , d, ef, gh, ijk

In the No match list the first item is a blank entry. This signifies that the expression must be matched at least once.

[a-z]{4}

This means that the regular expression will only match 4 lower case letters exactly.

[a-z]{4,5}

This regular expression means to match at least 4 lowercase letters, and at most 5 lowercase letters.

Now, what if you want to match something that is not in the a character class?

[^AEIOUaeiou]

This regular expression will match anything that is not a vowel (uppercase or lowercase).

There is A LOT more that you can do, but that is enough that we can look at how to use these methods in Java. Have a look at this: regular expressions tutorial for more on regular expressions.

String Manipulation

Before we look into pattern matching, we will use some regular expressions with the methods in the String class.

String.matches method

You have a string and you want to match it against a regular expression (to make sure it is valid). It might be a phone number, email, sql query or something else. Once you write the method it is as easy as writing

if (s.matches("test")) {
       // tests if s contains test. This is the same as if (s.equals("test")) but you will see why this is brillaint latter.
	System.out.println("s  = test");
} else {
	System.out.println("s <> test");
}

let us match a string against a phone number. The format of canadian phone numbers is xxx-xxx-xxxx where the area code is required. A simple regular expression for this would be:

[0-9]{3}-{1}[0-9]{3}-{1}[0-9]{4}

This would match a 3 digit number, a dash followed by another 3 digit number, followed by a dash and then followed by a 4 digit number.

Consider this phone number:

111-111-1111

Is it valid? Let us your our regular expression to try it.

s = "111-111-1111";

if (s.matches("[0-9]{3}-{1}[0-9]{3}-{1}[0-9]{4}")) {
 	System.out.println("Valid phone number.");
}  else {
	System.out.println("Invalid phone number.");
}

The output is:

Valid phone number.


Now let us change the phone number to:
111-111-111

Try this code:

s = "111-111-111";

        if (s.matches("[0-9]{3}-{1}[0-9]{3}-{1}[0-9]{4}")) {
            System.out.println("Valid phone number.");
        } else {
            System.out.println("Invalid phone number.");
        }

Output:

Invalid phone number.


The challenge with these is getting the regular expression once you got it they are great for use in swing applications for validating user input. In my project, I created a regex library which contains all the regular expressions that I used throughout my project. I wrote it the first day, and I've used it everyday for months since then. It is a very USEFUL area.

Even better, is this concept applies in a lot of languages: Python, Java, Perl, PHP, VB. Almost any language you can name (except C++ :() has built-in support for regular expressions.

String.replace method

This method is used to replace parts of strings that match a pattern.

We use the replace method and give it two parameters. One is a regular expression to replace, and the other is what to replace it with.

String s = "Testing this is a something that is just a test.";

Let us replace the words "Test" with the word "game".

s = s.replace("Test","game");

The method returns a reference to a modified string.

Output:

gameing this is a something that is just a test.

Notice that it is case-sensitive.

Let us do something more fun, we want to replace all four letter words with ****. Why? We live in a planet, where it is a federal offense to use four letter words. :) A word is defined as at least one uppercase or lowercase letter.

So our regular expression could be:

\\b\\w{4}\\b

The \b just means to match a word boundary, this means that anything that isn't part of a word is ignored. The \w means to match a word of uppercase and lowercase letters.

Now we simply just do:

s = s.replaceAll("\\b\\w{4}\\b", "****");
System.out.println(s);

Output:

Testing **** is a something **** is **** a ****.


Now on planet CC, nobody shall ever say 4 letter words again. :)


Matching

Now, say you want to count the number of four letter words. This is where matching comes in handy. We are going to take a sentence, count the number of four letter words and display a message.

The first thing we need to do is create a Pattern object. We can't use a constructor but we have to use the compile method and pass it a regular expression. We use p.matcher to set up for matching against the string. Now we just use a while loop to count the number of matches. Then we display the output.

int nCount = 0;
String s = "Hi, we are from planet dude and want to bring you cake.";

Pattern p = Pattern.compile("\\b\\w{4}\\b"); 
Matcher m = p.matcher(s);

while (m.find()) {
            nCount++;
        }

if (nCount == 0) {
	System.out.println("Good boy! +rep for you");
} else if (nCount == 1) {
	System.out.println("We will excuse you for using the cursed word.");
} else {
	System.out.println("OMFG! You use a lot of bad words. -rep, infracted, banned. *mad*");
}

Output:

OMFG! You use a lot of bad words. -rep, infracted, banned. *mad*


Try changing s so that it contains zero 4 letter words. What is the output? Try 1 for letter word.

You have now learned the basics of how text editors work.

Regex Library

Earlier, I mentioned that I made a library of useful functions for validating text.

Here is one method from it:

public boolean isValidName(String sName) {
        sName = sName.replaceAll("[^a-zA-Z]", "");
        String sPattern = "^([&'\\s]*[A-Z]\\S+)+";

        // Pattern p = Pattern.compile(sPattern,Pattern.CASE_INSENSITIVE);
        // to match strings in a case insensitive way
        Pattern p = Pattern.compile(sPattern);
        Matcher m = p.matcher(sName);

       

        return m.matches();
     }

I can easily take this simple class and reuse the methods in other projects. Literally, I wrote this the first day and used it for months without modification. It is very handy!

Others

Surely, there is a C++ programmer reading this thinking WTF can I not do that? Look here: regular expressions.
  • 2

#2 Guest_Jordan_*

Guest_Jordan_*
  • Guest

Posted 27 August 2009 - 03:56 AM

Regular expressions are amazingly handy and knowing how to use them is invaluable. +rep
  • 0

#3 WingedPanther

WingedPanther

    A spammer's worst nightmare

  • Moderator
  • 16,988 posts
  • Location:Upstate, South Carolina
  • Programming Language:C, C++, PL/SQL, Delphi/Object Pascal, Pascal, Transact-SQL, Others
  • Learning:Java, C#, PHP, JavaScript, Lisp, Fortran, Haskell, Others

Posted 27 August 2009 - 08:07 AM

No, I think Boost takes care of that minor "oversight" :) Also, C++0x has Regular Expressions as a proposed addition to the Standard Library.

Why didn't you mention the + and * qualifiers? [A-Z]+ just seems cleaner than [A-Z]{0,}
  • 0

Programming is a branch of mathematics.
My CodeCall Blog | My Personal Blog

My MineCraft server site: http://banishedwings.enjin.com/


#4 chili5

chili5

    CC Mentor

  • Expert Member
  • PipPipPipPipPipPipPipPip
  • 3,031 posts
  • Programming Language:Java, C#, PHP, JavaScript, Transact-SQL
  • Learning:C, Java, C++, C#, PHP, JavaScript, Transact-SQL, Assembly, Scheme

Posted 27 August 2009 - 08:21 AM

I didn't bother because I'm more used of using {0,} it just makes more sense to me. You are right though that those qualifiers are cleaner. :)

I think that regular expressions should have been in the standard library a long time ago. Well Boost takes care of all the minor oversights of C++. :)
  • 0

#5 John

John

    CC Mentor

  • Moderator
  • 4,450 posts
  • Location:New York, NY

Posted 27 August 2009 - 08:27 AM

Very nice indeed. I've had the urge to make an advanced regular expressions tutorial, but for some reason, it hasn't become reality.
  • 0

#6 WingedPanther

WingedPanther

    A spammer's worst nightmare

  • Moderator
  • 16,988 posts
  • Location:Upstate, South Carolina
  • Programming Language:C, C++, PL/SQL, Delphi/Object Pascal, Pascal, Transact-SQL, Others
  • Learning:Java, C#, PHP, JavaScript, Lisp, Fortran, Haskell, Others

Posted 27 August 2009 - 08:44 AM

C++ was standardized before Regular Expressions were a "cool", "must-have" language feature.
  • 0

Programming is a branch of mathematics.
My CodeCall Blog | My Personal Blog

My MineCraft server site: http://banishedwings.enjin.com/


#7 BlaineSch

BlaineSch

    CC Leader

  • Expert Member
  • PipPipPipPipPipPipPip
  • 1,559 posts

Posted 27 August 2009 - 12:47 PM

Very good +Rep!

Very nice indeed. I've had the urge to make an advanced regular expressions tutorial, but for some reason, it hasn't become reality.

You should!
  • 0

#8 chili5

chili5

    CC Mentor

  • Expert Member
  • PipPipPipPipPipPipPipPip
  • 3,031 posts
  • Programming Language:Java, C#, PHP, JavaScript, Transact-SQL
  • Learning:C, Java, C++, C#, PHP, JavaScript, Transact-SQL, Assembly, Scheme

Posted 27 August 2009 - 12:51 PM

Yes, John you should. :)
  • 0

#9 BlaineSch

BlaineSch

    CC Leader

  • Expert Member
  • PipPipPipPipPipPipPip
  • 1,559 posts

Posted 27 August 2009 - 02:55 PM

Btw, wouldnt let me rep you =/ I needa spread the love around a bit!
  • 0

#10 chili5

chili5

    CC Mentor

  • Expert Member
  • PipPipPipPipPipPipPipPip
  • 3,031 posts
  • Programming Language:Java, C#, PHP, JavaScript, Transact-SQL
  • Learning:C, Java, C++, C#, PHP, JavaScript, Transact-SQL, Assembly, Scheme

Posted 27 August 2009 - 03:28 PM

It's all good. :)
  • 0

#11 Prog4rammer

Prog4rammer

    CC Newcomer

  • Just Joined
  • PipPip
  • 14 posts

Posted 20 April 2010 - 08:14 AM

Thanks a lot it's very cool :)

  • 0

#12 GMVResources

GMVResources

    CC Resident

  • Just Joined
  • PipPipPipPip
  • 71 posts

Posted 14 June 2010 - 03:13 PM

Nice tutorial rep+ :P
  • 0