I'm currently developing an "intelligent" web app that can understand certain sentences related to schedules, news feeds, and other related things (not unlike Siri). For example, if I tell the app:
I ran 4.5 miles today in 37 minutes and 15 seconds.
then I want it to log the run length and duration so I can pull it up later with a query. The way I'd like to do this is by using regular expressions to pull the data from the sentence. Ideally I'd like to be able to have a registry containing regexes resembling something like the following:
I ran (length) (date) (duration).
and have the app know what to look for. Basically a specialized version of what regex calls a character class, except this would be a "word class".
Is there some obvious way to attack this or is it just pie-in-the-sky thinking? This is a personal project and I'm fully prepared for some late nights, so bring it on! ^^
4 replies to this topic
#1
Posted 14 December 2011 - 03:55 PM
|
|
|
#2
Posted 14 December 2011 - 04:06 PM
Rather than doing regular expressions, I would start by identifying units of measure.
#3
Posted 14 December 2011 - 04:11 PM
Do you mean incorporating the units directly into the regular expressions? I would do that, except I expect to be able to dynamically add units of measure from the user interface. Also, it's not just quantities I'd like to be able to recognize... for example:
(date) (time) I have a meeting with (people).
At (date) (time) remember to (phrase).
where (phrase) can be something like "text me 'good morning'". In other words, I'd love to preprogram units of measure, but that would limit the future flexibility of the app. I'm hoping to be able to create not only new words but entire new parts of speech on the fly.
(date) (time) I have a meeting with (people).
At (date) (time) remember to (phrase).
where (phrase) can be something like "text me 'good morning'". In other words, I'd love to preprogram units of measure, but that would limit the future flexibility of the app. I'm hoping to be able to create not only new words but entire new parts of speech on the fly.
#4
Posted 14 December 2011 - 04:14 PM
A regular expression for preg_split: '#[\\s.]#
I would personally parse each phrase separately.
Ignore "I"
If you find "I verbed" you can assume it is an activity.
You can then look for "2.2" and look at any characters after it (mi, miles,nothing) and warn if no recognized unit is defined. Regular expressions can be used to validate dates, if you want to ignore garbage data (it was rainy monday and I ran 2.2 miles) = "ran, 2.2 miles, monday = 12/05/11)
If two dates are found anywhere you can assume it is a difference (from 2:29pm to 3:00pm = 31 minutes)
Two time units (minutes + seconds) can be combined in the end with separate rules.
I am sure if this was a huge broad service you could employ various parsers for grammar, however for a personal project I'd personally code a parser with some basic rules.
I would personally parse each phrase separately.
Quote
I ran (length) (date) (duration).
Ignore "I"
If you find "I verbed" you can assume it is an activity.
You can then look for "2.2" and look at any characters after it (mi, miles,nothing) and warn if no recognized unit is defined. Regular expressions can be used to validate dates, if you want to ignore garbage data (it was rainy monday and I ran 2.2 miles) = "ran, 2.2 miles, monday = 12/05/11)
If two dates are found anywhere you can assume it is a difference (from 2:29pm to 3:00pm = 31 minutes)
Two time units (minutes + seconds) can be combined in the end with separate rules.
I am sure if this was a huge broad service you could employ various parsers for grammar, however for a personal project I'd personally code a parser with some basic rules.
Be sure to read the updated FAQ! || Health is achieved through the same 10,000 steps.
If a suggested code/method fails, informing us is less important than telling us why or what errors occurred.
If a suggested code/method fails, informing us is less important than telling us why or what errors occurred.
#5
Posted 15 December 2011 - 04:47 AM
I don't think regular expressions, in general, are necessarily the best tool. Try coming up with a bunch of different valid phrases to parse, first, and perhaps you'll see why there could be some difficulty.
1 user(s) are reading this topic
0 members, 1 guests, 0 anonymous users


Sign In
Create Account

Back to top









