Jump to content

Regular expression matching: Predictive features?

- - - - -

  • Please log in to reply
7 replies to this topic

#1
gregwarner

gregwarner

    Programming God

  • Members
  • PipPipPipPipPipPipPip
  • 853 posts
  • Location:Arkansas
Is it possible with Java's Pattern matcher (or other reg. ex. matcher), based on the number of characters matched thus far on a partial match, retrieve an array of possible valid characters for the next character in the input sequence? Here's what I mean:

I'm trying to implement something like a smart text field, in which I use a reg. ex. to provide some sort of a formatting "mask" on the user's input. Take, for example, a mask which would accept only social security numbers:

String ssn_pattern = "\\d{3}-\\d{2}-\\d{4}";

I could easily use a DocumentFilter with a Pattern Matcher with the hitEnd() method to see if I have a partial match, so I could reject invalid characters on a letter-by-letter basis as the user is typing. However, here's what I'd like to do:

With each keystroke, I'd like the DocumentFilter to look at the pattern and retrieve an array of possible characters that could come next in the pattern. So, if the user types a "1" into the text field, the set of next valid characters would be {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}. But, when the user types "123", the set of next valid characters would be {-}. There is only one element in this set, and so I'd like the DocumentFilter to automatically append this character onto the input sequence.

So effectively what I'll have is a text field that, as the user types, both validates a partial input sequence, as well as looks ahead one character, and if there is only one valid character that could come next, proactively types it in for the user.

The benefit would be that fields that require formatting could be keyed in without worrying about the symbols that should be interspersed in the text, while automatically adding and displaying those symbols to the user as he or she is typing.

Possible?
Hofstadter's Law: It always takes longer than you expect, even when you take into account Hofstadter's Law.

– Douglas Hofstadter, Gödel, Escher, Bach: An Eternal Golden Braid


#2
wim DC

wim DC

    Writes binary right handed and hex left handed

  • Members
  • PipPipPipPipPipPipPipPipPip
  • 2,084 posts
  • Programming Language:Java, JavaScript, PL/SQL
  • Learning:Java
Propably the Pattern class:
Pattern (Java 2 Platform SE v1.4.2)
it has a matcher method which returns a Matcher object:
Matcher (Java 2 Platform SE v1.4.2)
And this one has the following method:

Quote

end

public int end()
Returns the index of the last character matched, plus one.
Returns:
The index of the last character matched, plus one
Throws:
IllegalStateException - If no match has yet been attempted, or if the previous match operation failed


#3
gregwarner

gregwarner

    Programming God

  • Members
  • PipPipPipPipPipPipPip
  • 853 posts
  • Location:Arkansas
That only returns the index. I'm needing to get the actual characters that would result in a match if they were added.
Hofstadter's Law: It always takes longer than you expect, even when you take into account Hofstadter's Law.

– Douglas Hofstadter, Gödel, Escher, Bach: An Eternal Golden Braid


#4
gregwarner

gregwarner

    Programming God

  • Members
  • PipPipPipPipPipPipPip
  • 853 posts
  • Location:Arkansas
Let me reword my question:

Given a regular expression, r, that matches a string of minimum length, n, and a candidate string, c, of length n - x, where (0 ≤ xn), and c is a match or partial match on r, how can one find the set of all valid characters, V, such that c + Vi is a match or partial match on r, where "+" is the string concatenation operator and Vi is an element of V?
Hofstadter's Law: It always takes longer than you expect, even when you take into account Hofstadter's Law.

– Douglas Hofstadter, Gödel, Escher, Bach: An Eternal Golden Braid


#5
wim DC

wim DC

    Writes binary right handed and hex left handed

  • Members
  • PipPipPipPipPipPipPipPipPip
  • 2,084 posts
  • Programming Language:Java, JavaScript, PL/SQL
  • Learning:Java
Well, as far as I know I don't know any method or any class which does that so... DIY :)
I'm not sure if the following code can be improved greatly or whether it is worth the processing power.

It's a social security number generator :D, with regexes

public class Regex {


    public static void main(String... args){

        Pattern pattern = Pattern.compile("\\d{3}-\\d{2}-\\d{4}");

        Matcher matcher = pattern.matcher("");

        Random random = new Random();

        String currentValue = "";

        

        List<Character> possibilities = getNextPosibilities(matcher, currentValue); 

        

        while(!possibilities.isEmpty()){

           currentValue += possibilities.get(random.nextInt(possibilities.size())); 

           System.out.println(currentValue);

           possibilities = getNextPosibilities(matcher, currentValue); 

        }

    }


    private static List<Character> getNextPosibilities(Matcher matcher, String currentValue){

        List<Character> possibles = new ArrayList<Character>();


        for(char c=0; c<127 ; c++){

            matcher.reset(currentValue+c);

            if(matcher.matches() || matcher.hitEnd()){

                possibles.add(c);

            }

        }


        return possibles;

    }


}

(I took c until 127 as that's where the basic ascii table ends Ascii Table)

(And watch out with '-' in a pattern, it's better escaped to prevent it being seen as a range indicator like used in [a-z])

#6
gregwarner

gregwarner

    Programming God

  • Members
  • PipPipPipPipPipPipPip
  • 853 posts
  • Location:Arkansas
Yeah, I wanted to avoid checking every single character if I could, because it would be a huge performance drain since the logic will be checking the next character for each key the user types.

I've been Googling, and I think the only way I can do this with any efficiency is by implementing a DFA. (Deterministic Finite-state Automaton) I would have to construct the DFA for the reg. ex. when the text field is created, and traverse the states of the machine as the user types each letter. I could write it in such a way that each state could return a List of states connected to itself, and get the information I'm needing that way. In essence, I'm not needing to know literally all the valid characters, only what the set size is. If the set size is 1, I can look up what character that is.

Thanks for the input, though. Looks like my "brilliant" idea just added another couple days of work.
Hofstadter's Law: It always takes longer than you expect, even when you take into account Hofstadter's Law.

– Douglas Hofstadter, Gödel, Escher, Bach: An Eternal Golden Braid


#7
wim DC

wim DC

    Writes binary right handed and hex left handed

  • Members
  • PipPipPipPipPipPipPipPipPip
  • 2,084 posts
  • Programming Language:Java, JavaScript, PL/SQL
  • Learning:Java

gregwarner said:

Looks like my "brilliant" idea just added another couple days of work.
As to quote yourself: It always takes longer than you expect, even when you take into account Hofstadter's Law. ^^

#8
gregwarner

gregwarner

    Programming God

  • Members
  • PipPipPipPipPipPipPip
  • 853 posts
  • Location:Arkansas

wim DC said:

As to quote yourself: It always takes longer than you expect, even when you take into account Hofstadter's Law. ^^

Kudos for reminding me of that. :)
Hofstadter's Law: It always takes longer than you expect, even when you take into account Hofstadter's Law.

– Douglas Hofstadter, Gödel, Escher, Bach: An Eternal Golden Braid





1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users