+ Reply to Thread
Page 1 of 2
1 2 LastLast
Results 1 to 10 of 12

Thread: Regular Expressions

  1. #1
    Code Slinger chili5 has a reputation beyond repute chili5 has a reputation beyond repute chili5 has a reputation beyond repute chili5 has a reputation beyond repute chili5 has a reputation beyond repute chili5 has a reputation beyond repute chili5 has a reputation beyond repute chili5 has a reputation beyond repute chili5 has a reputation beyond repute chili5 has a reputation beyond repute chili5 has a reputation beyond repute chili5's Avatar
    Join Date
    Mar 2008
    Posts
    7,042

    Regular Expressions

    Regular Expressions in Java

    Regular expressions provide a really fast and flexible method of matching strings and matching patterns in strings. The String class uses them to perform things like replacing text. These are commonly used by text editors and can make some tasks really easy.

    The package that you need to import is java.util.regex.*;. This contains a few classes that you will need. They are Pattern and Matcher.

    Regular Expressions

    Regular expressions can be a complicated but useful method in manipulating, and validating strings.

    First we will look at some simple regular expressions.

    Code:
    [A-Z]
    This is a character class and it will match any thing that is a letter between A and Z. This is the same as

    Code:
    return ch >= 'A' && ch <= 'Z';
    Except, it is a string operation and not a character operation.

    The above will match:

    A, E, I, D, G

    It will not match: a, e, i, o, !, ?.

    Code:
    [^A-Z]
    This regular expression is used to match any character that is not an upper case letter.

    We can use brace brackets to indicate exactly how many times to match and expression.

    Example:

    Code:
    [A-Z]{1,}
    This expression means that we want to match any uppercase letter at least once.

    Matches: A, BC, DEF, GHIKL
    No match: , d, ef, gh, ijk

    In the No match list the first item is a blank entry. This signifies that the expression must be matched at least once.

    Code:
    [a-z]{4}
    This means that the regular expression will only match 4 lower case letters exactly.

    Code:
    [a-z]{4,5}
    This regular expression means to match at least 4 lowercase letters, and at most 5 lowercase letters.

    Now, what if you want to match something that is not in the a character class?

    Code:
    [^AEIOUaeiou]
    This regular expression will match anything that is not a vowel (uppercase or lowercase).

    There is A LOT more that you can do, but that is enough that we can look at how to use these methods in Java. Have a look at this: regular expressions tutorial for more on regular expressions.

    String Manipulation

    Before we look into pattern matching, we will use some regular expressions with the methods in the String class.

    String.matches method

    You have a string and you want to match it against a regular expression (to make sure it is valid). It might be a phone number, email, sql query or something else. Once you write the method it is as easy as writing

    Code:
    if (s.matches("test")) {
           // tests if s contains test. This is the same as if (s.equals("test")) but you will see why this is brillaint latter.
    	System.out.println("s  = test");
    } else {
    	System.out.println("s <> test");
    }
    let us match a string against a phone number. The format of canadian phone numbers is xxx-xxx-xxxx where the area code is required. A simple regular expression for this would be:

    Code:
    [0-9]{3}-{1}[0-9]{3}-{1}[0-9]{4}
    This would match a 3 digit number, a dash followed by another 3 digit number, followed by a dash and then followed by a 4 digit number.

    Consider this phone number:

    111-111-1111

    Is it valid? Let us your our regular expression to try it.

    Code:
    s = "111-111-1111";
    
    if (s.matches("[0-9]{3}-{1}[0-9]{3}-{1}[0-9]{4}")) {
     	System.out.println("Valid phone number.");
    }  else {
    	System.out.println("Invalid phone number.");
    }
    The output is:
    Valid phone number.
    Now let us change the phone number to:
    111-111-111

    Try this code:

    Code:
    s = "111-111-111";
    
            if (s.matches("[0-9]{3}-{1}[0-9]{3}-{1}[0-9]{4}")) {
                System.out.println("Valid phone number.");
            } else {
                System.out.println("Invalid phone number.");
            }
    Output:
    Invalid phone number.
    The challenge with these is getting the regular expression once you got it they are great for use in swing applications for validating user input. In my project, I created a regex library which contains all the regular expressions that I used throughout my project. I wrote it the first day, and I've used it everyday for months since then. It is a very USEFUL area.

    Even better, is this concept applies in a lot of languages: Python, Java, Perl, PHP, VB. Almost any language you can name (except C++ ) has built-in support for regular expressions.

    String.replace method

    This method is used to replace parts of strings that match a pattern.

    We use the replace method and give it two parameters. One is a regular expression to replace, and the other is what to replace it with.

    Code:
    String s = "Testing this is a something that is just a test.";
    Let us replace the words "Test" with the word "game".

    Code:
    s = s.replace("Test","game");
    The method returns a reference to a modified string.

    Output:

    gameing this is a something that is just a test.

    Notice that it is case-sensitive.

    Let us do something more fun, we want to replace all four letter words with ****. Why? We live in a planet, where it is a federal offense to use four letter words. A word is defined as at least one uppercase or lowercase letter.

    So our regular expression could be:

    Code:
    \\b\\w{4}\\b
    The \b just means to match a word boundary, this means that anything that isn't part of a word is ignored. The \w means to match a word of uppercase and lowercase letters.

    Now we simply just do:

    Code:
    s = s.replaceAll("\\b\\w{4}\\b", "****");
    System.out.println(s);
    Output:
    Testing **** is a something **** is **** a ****.
    Now on planet CC, nobody shall ever say 4 letter words again.


    Matching

    Now, say you want to count the number of four letter words. This is where matching comes in handy. We are going to take a sentence, count the number of four letter words and display a message.

    The first thing we need to do is create a Pattern object. We can't use a constructor but we have to use the compile method and pass it a regular expression. We use p.matcher to set up for matching against the string. Now we just use a while loop to count the number of matches. Then we display the output.

    Code:
    int nCount = 0;
    String s = "Hi, we are from planet dude and want to bring you cake.";
    
    Pattern p = Pattern.compile("\\b\\w{4}\\b"); 
    Matcher m = p.matcher(s);
    
    while (m.find()) {
                nCount++;
            }
    
    if (nCount == 0) {
    	System.out.println("Good boy! +rep for you");
    } else if (nCount == 1) {
    	System.out.println("We will excuse you for using the cursed word.");
    } else {
    	System.out.println("OMFG! You use a lot of bad words. -rep, infracted, banned. *mad*");
    }
    Output:

    OMFG! You use a lot of bad words. -rep, infracted, banned. *mad*
    Try changing s so that it contains zero 4 letter words. What is the output? Try 1 for letter word.

    You have now learned the basics of how text editors work.

    Regex Library

    Earlier, I mentioned that I made a library of useful functions for validating text.

    Here is one method from it:

    Code:
    public boolean isValidName(String sName) {
            sName = sName.replaceAll("[^a-zA-Z]", "");
            String sPattern = "^([&'\\s]*[A-Z]\\S+)+";
    
            // Pattern p = Pattern.compile(sPattern,Pattern.CASE_INSENSITIVE);
            // to match strings in a case insensitive way
            Pattern p = Pattern.compile(sPattern);
            Matcher m = p.matcher(sName);
    
           
    
            return m.matches();
         }
    I can easily take this simple class and reuse the methods in other projects. Literally, I wrote this the first day and used it for months without modification. It is very handy!

    Others

    Surely, there is a C++ programmer reading this thinking WTF can I not do that? Look here: regular expressions.

  2. #2
    Administrator Jordan is a name known to all Jordan is a name known to all Jordan is a name known to all Jordan is a name known to all Jordan is a name known to all Jordan is a name known to all Jordan's Avatar
    Join Date
    Nov 2005
    Location
    Hendersonville, NC
    Posts
    24,750
    Blog Entries
    97

    Re: Regular Expressions

    Regular expressions are amazingly handy and knowing how to use them is invaluable. +rep

  3. #3
    Super Moderator WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther's Avatar
    Join Date
    Jul 2006
    Age
    37
    Posts
    13,155
    Blog Entries
    59

    Re: Regular Expressions

    No, I think Boost takes care of that minor "oversight" Also, C++0x has Regular Expressions as a proposed addition to the Standard Library.

    Why didn't you mention the + and * qualifiers? [A-Z]+ just seems cleaner than [A-Z]{0,}
    CodeCall Blog | CodeCall Wiki
    Programming is a branch of mathematics.
    My CodeCall Blog | My Personal Blog

  4. #4
    Code Slinger chili5 has a reputation beyond repute chili5 has a reputation beyond repute chili5 has a reputation beyond repute chili5 has a reputation beyond repute chili5 has a reputation beyond repute chili5 has a reputation beyond repute chili5 has a reputation beyond repute chili5 has a reputation beyond repute chili5 has a reputation beyond repute chili5 has a reputation beyond repute chili5 has a reputation beyond repute chili5's Avatar
    Join Date
    Mar 2008
    Posts
    7,042

    Re: Regular Expressions

    I didn't bother because I'm more used of using {0,} it just makes more sense to me. You are right though that those qualifiers are cleaner.

    I think that regular expressions should have been in the standard library a long time ago. Well Boost takes care of all the minor oversights of C++.

  5. #5
    Co-Administrator John is a glorious beacon of light John is a glorious beacon of light John is a glorious beacon of light John is a glorious beacon of light John is a glorious beacon of light John's Avatar
    Join Date
    Jul 2006
    Age
    21
    Posts
    5,890
    Blog Entries
    25

    Re: Regular Expressions

    Very nice indeed. I've had the urge to make an advanced regular expressions tutorial, but for some reason, it hasn't become reality.

  6. #6
    Super Moderator WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther's Avatar
    Join Date
    Jul 2006
    Age
    37
    Posts
    13,155
    Blog Entries
    59

    Re: Regular Expressions

    C++ was standardized before Regular Expressions were a "cool", "must-have" language feature.
    CodeCall Blog | CodeCall Wiki
    Programming is a branch of mathematics.
    My CodeCall Blog | My Personal Blog

  7. #7
    Code Warrior BlaineSch is a name known to all BlaineSch is a name known to all BlaineSch is a name known to all BlaineSch is a name known to all BlaineSch is a name known to all BlaineSch is a name known to all BlaineSch's Avatar
    Join Date
    Apr 2009
    Location
    Trapped in my own little world.
    Age
    20
    Posts
    2,289
    Blog Entries
    8

    Re: Regular Expressions

    Very good +Rep!

    Quote Originally Posted by John View Post
    Very nice indeed. I've had the urge to make an advanced regular expressions tutorial, but for some reason, it hasn't become reality.
    You should!

  8. #8
    Code Slinger chili5 has a reputation beyond repute chili5 has a reputation beyond repute chili5 has a reputation beyond repute chili5 has a reputation beyond repute chili5 has a reputation beyond repute chili5 has a reputation beyond repute chili5 has a reputation beyond repute chili5 has a reputation beyond repute chili5 has a reputation beyond repute chili5 has a reputation beyond repute chili5 has a reputation beyond repute chili5's Avatar
    Join Date
    Mar 2008
    Posts
    7,042

    Re: Regular Expressions

    Yes, John you should.

  9. #9
    Code Warrior BlaineSch is a name known to all BlaineSch is a name known to all BlaineSch is a name known to all BlaineSch is a name known to all BlaineSch is a name known to all BlaineSch is a name known to all BlaineSch's Avatar
    Join Date
    Apr 2009
    Location
    Trapped in my own little world.
    Age
    20
    Posts
    2,289
    Blog Entries
    8

    Re: Regular Expressions

    Btw, wouldnt let me rep you =/ I needa spread the love around a bit!

  10. #10
    Code Slinger chili5 has a reputation beyond repute chili5 has a reputation beyond repute chili5 has a reputation beyond repute chili5 has a reputation beyond repute chili5 has a reputation beyond repute chili5 has a reputation beyond repute chili5 has a reputation beyond repute chili5 has a reputation beyond repute chili5 has a reputation beyond repute chili5 has a reputation beyond repute chili5 has a reputation beyond repute chili5's Avatar
    Join Date
    Mar 2008
    Posts
    7,042

    Re: Regular Expressions

    It's all good.

+ Reply to Thread
Page 1 of 2
1 2 LastLast

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

     

Similar Threads

  1. regular expressions
    By Lop in forum C and C++
    Replies: 5
    Last Post: 09-16-2008, 07:38 AM
  2. Regular Expressions
    By John in forum PHP Tutorials
    Replies: 27
    Last Post: 09-11-2008, 11:25 AM
  3. Replies: 0
    Last Post: 08-16-2008, 01:50 PM
  4. Regular expressions
    By Nightracer in forum General Programming
    Replies: 6
    Last Post: 07-24-2006, 06:57 PM