Regular Expressions in Java
Regular expressions provide a really fast and flexible method of matching strings and matching patterns in strings. The String class uses them to perform things like replacing text. These are commonly used by text editors and can make some tasks really easy.
The package that you need to import is java.util.regex.*;. This contains a few classes that you will need. They are Pattern and Matcher.
Regular Expressions
Regular expressions can be a complicated but useful method in manipulating, and validating strings.
First we will look at some simple regular expressions.
This is a character class and it will match any thing that is a letter between A and Z. This is the same asCode:[A-Z]
Except, it is a string operation and not a character operation.Code:return ch >= 'A' && ch <= 'Z';
The above will match:
A, E, I, D, G
It will not match: a, e, i, o, !, ?.
This regular expression is used to match any character that is not an upper case letter.Code:[^A-Z]
We can use brace brackets to indicate exactly how many times to match and expression.
Example:
This expression means that we want to match any uppercase letter at least once.Code:[A-Z]{1,}
Matches: A, BC, DEF, GHIKL
No match: , d, ef, gh, ijk
In the No match list the first item is a blank entry. This signifies that the expression must be matched at least once.
This means that the regular expression will only match 4 lower case letters exactly.Code:[a-z]{4}
This regular expression means to match at least 4 lowercase letters, and at most 5 lowercase letters.Code:[a-z]{4,5}
Now, what if you want to match something that is not in the a character class?
This regular expression will match anything that is not a vowel (uppercase or lowercase).Code:[^AEIOUaeiou]
There is A LOT more that you can do, but that is enough that we can look at how to use these methods in Java. Have a look at this: regular expressions tutorial for more on regular expressions.
String Manipulation
Before we look into pattern matching, we will use some regular expressions with the methods in the String class.
String.matches method
You have a string and you want to match it against a regular expression (to make sure it is valid). It might be a phone number, email, sql query or something else. Once you write the method it is as easy as writing
let us match a string against a phone number. The format of canadian phone numbers is xxx-xxx-xxxx where the area code is required. A simple regular expression for this would be:Code:if (s.matches("test")) { // tests if s contains test. This is the same as if (s.equals("test")) but you will see why this is brillaint latter. System.out.println("s = test"); } else { System.out.println("s <> test"); }
This would match a 3 digit number, a dash followed by another 3 digit number, followed by a dash and then followed by a 4 digit number.Code:[0-9]{3}-{1}[0-9]{3}-{1}[0-9]{4}
Consider this phone number:
111-111-1111
Is it valid? Let us your our regular expression to try it.
The output is:Code:s = "111-111-1111"; if (s.matches("[0-9]{3}-{1}[0-9]{3}-{1}[0-9]{4}")) { System.out.println("Valid phone number."); } else { System.out.println("Invalid phone number."); }
Now let us change the phone number to:Valid phone number.
111-111-111
Try this code:
Output:Code:s = "111-111-111"; if (s.matches("[0-9]{3}-{1}[0-9]{3}-{1}[0-9]{4}")) { System.out.println("Valid phone number."); } else { System.out.println("Invalid phone number."); }
The challenge with these is getting the regular expression once you got it they are great for use in swing applications for validating user input. In my project, I created a regex library which contains all the regular expressions that I used throughout my project. I wrote it the first day, and I've used it everyday for months since then. It is a very USEFUL area.Invalid phone number.
Even better, is this concept applies in a lot of languages: Python, Java, Perl, PHP, VB. Almost any language you can name (except C++) has built-in support for regular expressions.
String.replace method
This method is used to replace parts of strings that match a pattern.
We use the replace method and give it two parameters. One is a regular expression to replace, and the other is what to replace it with.
Let us replace the words "Test" with the word "game".Code:String s = "Testing this is a something that is just a test.";
The method returns a reference to a modified string.Code:s = s.replace("Test","game");
Output:
gameing this is a something that is just a test.
Notice that it is case-sensitive.
Let us do something more fun, we want to replace all four letter words with ****. Why? We live in a planet, where it is a federal offense to use four letter words.A word is defined as at least one uppercase or lowercase letter.
So our regular expression could be:
The \b just means to match a word boundary, this means that anything that isn't part of a word is ignored. The \w means to match a word of uppercase and lowercase letters.Code:\\b\\w{4}\\b
Now we simply just do:
Output:Code:s = s.replaceAll("\\b\\w{4}\\b", "****"); System.out.println(s);
Now on planet CC, nobody shall ever say 4 letter words again.Testing **** is a something **** is **** a ****.
Matching
Now, say you want to count the number of four letter words. This is where matching comes in handy. We are going to take a sentence, count the number of four letter words and display a message.
The first thing we need to do is create a Pattern object. We can't use a constructor but we have to use the compile method and pass it a regular expression. We use p.matcher to set up for matching against the string. Now we just use a while loop to count the number of matches. Then we display the output.
Output:Code:int nCount = 0; String s = "Hi, we are from planet dude and want to bring you cake."; Pattern p = Pattern.compile("\\b\\w{4}\\b"); Matcher m = p.matcher(s); while (m.find()) { nCount++; } if (nCount == 0) { System.out.println("Good boy! +rep for you"); } else if (nCount == 1) { System.out.println("We will excuse you for using the cursed word."); } else { System.out.println("OMFG! You use a lot of bad words. -rep, infracted, banned. *mad*"); }
Try changing s so that it contains zero 4 letter words. What is the output? Try 1 for letter word.OMFG! You use a lot of bad words. -rep, infracted, banned. *mad*
You have now learned the basics of how text editors work.
Regex Library
Earlier, I mentioned that I made a library of useful functions for validating text.
Here is one method from it:
I can easily take this simple class and reuse the methods in other projects. Literally, I wrote this the first day and used it for months without modification. It is very handy!Code:public boolean isValidName(String sName) { sName = sName.replaceAll("[^a-zA-Z]", ""); String sPattern = "^([&'\\s]*[A-Z]\\S+)+"; // Pattern p = Pattern.compile(sPattern,Pattern.CASE_INSENSITIVE); // to match strings in a case insensitive way Pattern p = Pattern.compile(sPattern); Matcher m = p.matcher(sName); return m.matches(); }
Others
Surely, there is a C++ programmer reading this thinking WTF can I not do that? Look here: regular expressions.
Regular expressions are amazingly handy and knowing how to use them is invaluable. +rep
No, I think Boost takes care of that minor "oversight"Also, C++0x has Regular Expressions as a proposed addition to the Standard Library.
Why didn't you mention the + and * qualifiers? [A-Z]+ just seems cleaner than [A-Z]{0,}
I didn't bother because I'm more used of using {0,} it just makes more sense to me. You are right though that those qualifiers are cleaner.
I think that regular expressions should have been in the standard library a long time ago. Well Boost takes care of all the minor oversights of C++.![]()
Very nice indeed. I've had the urge to make an advanced regular expressions tutorial, but for some reason, it hasn't become reality.
C++ was standardized before Regular Expressions were a "cool", "must-have" language feature.
Yes, John you should.![]()
Btw, wouldnt let me rep you =/ I needa spread the love around a bit!
It's all good.![]()
There are currently 1 users browsing this thread. (0 members and 1 guests)
Bookmarks