Go Back   CodeCall Programming Forum > Software Development > Tutorials > Java Tutorials
Register Blogs Search Today's Posts Mark Forums Read

Java Tutorials Tutorials and Code for Java

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 08-27-2009, 06:57 AM
chili5's Avatar
Code Slinger
 
Join Date: Mar 2008
Posts: 7,018
chili5 has a reputation beyond reputechili5 has a reputation beyond reputechili5 has a reputation beyond reputechili5 has a reputation beyond reputechili5 has a reputation beyond reputechili5 has a reputation beyond reputechili5 has a reputation beyond reputechili5 has a reputation beyond reputechili5 has a reputation beyond reputechili5 has a reputation beyond reputechili5 has a reputation beyond repute
Regular Expressions

Regular Expressions in Java

Regular expressions provide a really fast and flexible method of matching strings and matching patterns in strings. The String class uses them to perform things like replacing text. These are commonly used by text editors and can make some tasks really easy.

The package that you need to import is java.util.regex.*;. This contains a few classes that you will need. They are Pattern and Matcher.

Regular Expressions

Regular expressions can be a complicated but useful method in manipulating, and validating strings.

First we will look at some simple regular expressions.

Code:
[A-Z]
This is a character class and it will match any thing that is a letter between A and Z. This is the same as

Code:
return ch >= 'A' && ch <= 'Z';
Except, it is a string operation and not a character operation.

The above will match:

A, E, I, D, G

It will not match: a, e, i, o, !, ?.

Code:
[^A-Z]
This regular expression is used to match any character that is not an upper case letter.

We can use brace brackets to indicate exactly how many times to match and expression.

Example:

Code:
[A-Z]{1,}
This expression means that we want to match any uppercase letter at least once.

Matches: A, BC, DEF, GHIKL
No match: , d, ef, gh, ijk

In the No match list the first item is a blank entry. This signifies that the expression must be matched at least once.

Code:
[a-z]{4}
This means that the regular expression will only match 4 lower case letters exactly.

Code:
[a-z]{4,5}
This regular expression means to match at least 4 lowercase letters, and at most 5 lowercase letters.

Now, what if you want to match something that is not in the a character class?

Code:
[^AEIOUaeiou]
This regular expression will match anything that is not a vowel (uppercase or lowercase).

There is A LOT more that you can do, but that is enough that we can look at how to use these methods in Java. Have a look at this: regular expressions tutorial for more on regular expressions.

String Manipulation

Before we look into pattern matching, we will use some regular expressions with the methods in the String class.

String.matches method

You have a string and you want to match it against a regular expression (to make sure it is valid). It might be a phone number, email, sql query or something else. Once you write the method it is as easy as writing

Code:
if (s.matches("test")) {
       // tests if s contains test. This is the same as if (s.equals("test")) but you will see why this is brillaint latter.
	System.out.println("s  = test");
} else {
	System.out.println("s <> test");
}
let us match a string against a phone number. The format of canadian phone numbers is xxx-xxx-xxxx where the area code is required. A simple regular expression for this would be:

Code:
[0-9]{3}-{1}[0-9]{3}-{1}[0-9]{4}
This would match a 3 digit number, a dash followed by another 3 digit number, followed by a dash and then followed by a 4 digit number.

Consider this phone number:

111-111-1111

Is it valid? Let us your our regular expression to try it.

Code:
s = "111-111-1111";

if (s.matches("[0-9]{3}-{1}[0-9]{3}-{1}[0-9]{4}")) {
 	System.out.println("Valid phone number.");
}  else {
	System.out.println("Invalid phone number.");
}
The output is:
Quote:
Valid phone number.
Now let us change the phone number to:
111-111-111

Try this code:

Code:
s = "111-111-111";

        if (s.matches("[0-9]{3}-{1}[0-9]{3}-{1}[0-9]{4}")) {
            System.out.println("Valid phone number.");
        } else {
            System.out.println("Invalid phone number.");
        }
Output:
Quote:
Invalid phone number.
The challenge with these is getting the regular expression once you got it they are great for use in swing applications for validating user input. In my project, I created a regex library which contains all the regular expressions that I used throughout my project. I wrote it the first day, and I've used it everyday for months since then. It is a very USEFUL area.

Even better, is this concept applies in a lot of languages: Python, Java, Perl, PHP, VB. Almost any language you can name (except C++ ) has built-in support for regular expressions.

String.replace method

This method is used to replace parts of strings that match a pattern.

We use the replace method and give it two parameters. One is a regular expression to replace, and the other is what to replace it with.

Code:
String s = "Testing this is a something that is just a test.";
Let us replace the words "Test" with the word "game".

Code:
s = s.replace("Test","game");
The method returns a reference to a modified string.

Output:

gameing this is a something that is just a test.

Notice that it is case-sensitive.

Let us do something more fun, we want to replace all four letter words with ****. Why? We live in a planet, where it is a federal offense to use four letter words. A word is defined as at least one uppercase or lowercase letter.

So our regular expression could be:

Code:
\\b\\w{4}\\b
The \b just means to match a word boundary, this means that anything that isn't part of a word is ignored. The \w means to match a word of uppercase and lowercase letters.

Now we simply just do:

Code:
s = s.replaceAll("\\b\\w{4}\\b", "****");
System.out.println(s);
Output:
Quote:
Testing **** is a something **** is **** a ****.
Now on planet CC, nobody shall ever say 4 letter words again.


Matching

Now, say you want to count the number of four letter words. This is where matching comes in handy. We are going to take a sentence, count the number of four letter words and display a message.

The first thing we need to do is create a Pattern object. We can't use a constructor but we have to use the compile method and pass it a regular expression. We use p.matcher to set up for matching against the string. Now we just use a while loop to count the number of matches. Then we display the output.

Code:
int nCount = 0;
String s = "Hi, we are from planet dude and want to bring you cake.";

Pattern p = Pattern.compile("\\b\\w{4}\\b"); 
Matcher m = p.matcher(s);

while (m.find()) {
            nCount++;
        }

if (nCount == 0) {
	System.out.println("Good boy! +rep for you");
} else if (nCount == 1) {
	System.out.println("We will excuse you for using the cursed word.");
} else {
	System.out.println("OMFG! You use a lot of bad words. -rep, infracted, banned. *mad*");
}
Output:

Quote:
OMFG! You use a lot of bad words. -rep, infracted, banned. *mad*
Try changing s so that it contains zero 4 letter words. What is the output? Try 1 for letter word.

You have now learned the basics of how text editors work.

Regex Library

Earlier, I mentioned that I made a library of useful functions for validating text.

Here is one method from it:

Code:
public boolean isValidName(String sName) {
        sName = sName.replaceAll("[^a-zA-Z]", "");
        String sPattern = "^([&'\\s]*[A-Z]\\S+)+";

        // Pattern p = Pattern.compile(sPattern,Pattern.CASE_INSENSITIVE);
        // to match strings in a case insensitive way
        Pattern p = Pattern.compile(sPattern);
        Matcher m = p.matcher(sName);

       

        return m.matches();
     }
I can easily take this simple class and reuse the methods in other projects. Literally, I wrote this the first day and used it for months without modification. It is very handy!

Others

Surely, there is a C++ programmer reading this thinking WTF can I not do that? Look here: regular expressions.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 08-27-2009, 07:56 AM
Jordan's Avatar
Administrator
 
Join Date: Nov 2005
Location: Hendersonville, NC
Posts: 24,556
Jordan is a name known to allJordan is a name known to allJordan is a name known to allJordan is a name known to allJordan is a name known to allJordan is a name known to all
Send a message via ICQ to Jordan Send a message via AIM to Jordan Send a message via MSN to Jordan Send a message via Yahoo to Jordan
Re: Regular Expressions

Regular expressions are amazingly handy and knowing how to use them is invaluable. +rep
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 08-27-2009, 12:07 PM
WingedPanther's Avatar
Super Moderator
 
Join Date: Jul 2006
Age: 36
Posts: 11,435
WingedPanther has much to be proud ofWingedPanther has much to be proud ofWingedPanther has much to be proud ofWingedPanther has much to be proud ofWingedPanther has much to be proud ofWingedPanther has much to be proud ofWingedPanther has much to be proud ofWingedPanther has much to be proud ofWingedPanther has much to be proud of
Re: Regular Expressions

No, I think Boost takes care of that minor "oversight" Also, C++0x has Regular Expressions as a proposed addition to the Standard Library.

Why didn't you mention the + and * qualifiers? [A-Z]+ just seems cleaner than [A-Z]{0,}
__________________
CodeCall Blog | CodeCall Wiki | Shareware
Programming is a branch of mathematics.
My CodeCall Blog | My Personal Blog
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 08-27-2009, 12:21 PM
chili5's Avatar
Code Slinger
 
Join Date: Mar 2008
Posts: 7,018
chili5 has a reputation beyond reputechili5 has a reputation beyond reputechili5 has a reputation beyond reputechili5 has a reputation beyond reputechili5 has a reputation beyond reputechili5 has a reputation beyond reputechili5 has a reputation beyond reputechili5 has a reputation beyond reputechili5 has a reputation beyond reputechili5 has a reputation beyond reputechili5 has a reputation beyond repute
Re: Regular Expressions

I didn't bother because I'm more used of using {0,} it just makes more sense to me. You are right though that those qualifiers are cleaner.

I think that regular expressions should have been in the standard library a long time ago. Well Boost takes care of all the minor oversights of C++.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 08-27-2009, 12:27 PM
John's Avatar
Co-Administrator
 
Join Date: Jul 2006
Age: 21
Posts: 5,835
John is just really niceJohn is just really niceJohn is just really niceJohn is just really niceJohn is just really nice
Send a message via AIM to John Send a message via MSN to John
Re: Regular Expressions

Very nice indeed. I've had the urge to make an advanced regular expressions tutorial, but for some reason, it hasn't become reality.
__________________

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 08-27-2009, 12:44 PM
WingedPanther's Avatar
Super Moderator
 
Join Date: Jul 2006
Age: 36
Posts: 11,435
WingedPanther has much to be proud ofWingedPanther has much to be proud ofWingedPanther has much to be proud ofWingedPanther has much to be proud ofWingedPanther has much to be proud ofWingedPanther has much to be proud ofWingedPanther has much to be proud ofWingedPanther has much to be proud ofWingedPanther has much to be proud of
Re: Regular Expressions

C++ was standardized before Regular Expressions were a "cool", "must-have" language feature.
__________________
CodeCall Blog | CodeCall Wiki | Shareware
Programming is a branch of mathematics.
My CodeCall Blog | My Personal Blog
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 08-27-2009, 04:47 PM
BlaineSch's Avatar
Code Warrior
 
Join Date: Apr 2009
Location: Trapped in my own little world.
Age: 19
Posts: 2,169
BlaineSch is a glorious beacon of lightBlaineSch is a glorious beacon of lightBlaineSch is a glorious beacon of lightBlaineSch is a glorious beacon of lightBlaineSch is a glorious beacon of lightBlaineSch is a glorious beacon of light
Send a message via MSN to BlaineSch
Re: Regular Expressions

Very good +Rep!

Quote:
Originally Posted by John View Post
Very nice indeed. I've had the urge to make an advanced regular expressions tutorial, but for some reason, it hasn't become reality.
You should!
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 08-27-2009, 04:51 PM
chili5's Avatar
Code Slinger
 
Join Date: Mar 2008
Posts: 7,018
chili5 has a reputation beyond reputechili5 has a reputation beyond reputechili5 has a reputation beyond reputechili5 has a reputation beyond reputechili5 has a reputation beyond reputechili5 has a reputation beyond reputechili5 has a reputation beyond reputechili5 has a reputation beyond reputechili5 has a reputation beyond reputechili5 has a reputation beyond reputechili5 has a reputation beyond repute
Re: Regular Expressions

Yes, John you should.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 08-27-2009, 06:55 PM
BlaineSch's Avatar
Code Warrior
 
Join Date: Apr 2009
Location: Trapped in my own little world.
Age: 19
Posts: 2,169
BlaineSch is a glorious beacon of lightBlaineSch is a glorious beacon of lightBlaineSch is a glorious beacon of lightBlaineSch is a glorious beacon of lightBlaineSch is a glorious beacon of lightBlaineSch is a glorious beacon of light
Send a message via MSN to BlaineSch
Re: Regular Expressions

Btw, wouldnt let me rep you =/ I needa spread the love around a bit!
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 08-27-2009, 07:28 PM
chili5's Avatar
Code Slinger
 
Join Date: Mar 2008
Posts: 7,018
chili5 has a reputation beyond reputechili5 has a reputation beyond reputechili5 has a reputation beyond reputechili5 has a reputation beyond reputechili5 has a reputation beyond reputechili5 has a reputation beyond reputechili5 has a reputation beyond reputechili5 has a reputation beyond reputechili5 has a reputation beyond reputechili5 has a reputation beyond reputechili5 has a reputation beyond repute
Re: Regular Expressions

It's all good.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply



Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes


Similar Threads
Thread Thread Starter Forum Replies Last Post
regular expressions Lop C and C++ 5 09-16-2008 11:38 AM
Regular Expressions John PHP Tutorials 27 09-11-2008 03:25 PM
Async Computation Expressions - Resource and Exception Management Kernel News 0 08-16-2008 05:50 PM
Regular expressions Nightracer General Programming 6 07-24-2006 10:57 PM


All times are GMT -5. The time now is 11:01 AM.


vBulletin v3.8.0 ©2010, Jelsoft Enterprises Ltd.


no new posts

LinkBacks Enabled by vBSEO 3.1.0