|
||||||
| Programming Theory Discuss programming theory, algorithm efficiency, logic, and other any other category where math and computer science overlap. |
![]() |
|
|
LinkBack | Thread Tools | Search this Thread | Display Modes |
|
|||
|
Well, it's a lazy Sunday afternoon, a perfect time to noodle over a problem that's been bothering me for years.
I'm looking for an algorithm which will take an arbitrary piece of repeating text and create a template which shows which parts are constant and which parts repeat. I'll give several examples: fixed width text: Code:
Fred Mbago 27 Fleet St. Florence MI 30444 Terrence Swinge 100 Fnox Ln. Ann Arbor MI 28540 Bastian Bux 54 Fnoo Ct. Louisville KY 40333 In this case, I would like the algorithm to create a pattern which shows the characters on a single line which vary, and which ones remain constant, perhaps the following regular expression: Code:
(........) (......) (............) (..........) (..) (.....) The same data, but comma delimited would look like this: Code:
Fred,,Mbago,27 Fleet St.,Florence,MI,30444 Terrence,,Swinge,100 Fnox Ln.,Ann Arbor,MI,28540 Bastian,,Bux,54 Fnoo Ct.,Louisville,KY,40333 Code:
(.+),,(.+),(.+),(.+),(..),(.+) Code:
Address:
Fred
Mbago
27 Fleet St.
Florence
MI
30444
Address:
Terrence
Swinge
100 Fnox Ln.
Ann Arbor
MI
28540
Address:
Bastian
Bux
54 Fnoo Ct.
Louisville
KY
40333
Code:
/Address:\n\s+(.+)\n\s+(.+)\n\s+(.+)\n\s+(.+)\n\s+(.+)\n\s+(.+)\n/m I'm thinking that some sort of stochastic algorithm would be good... genetic algorithm or simulated annealing maybe... break down the originating text into small blocks, start putting blocks together at random, and see how much will fit the text as a whole... I'm not stuck on writing something like this myself, although I will if I have to; if someone can point me to the right key-words on google, that will be fine; I just don't exactly know what to look for right now. Anyway... something to chew on. |
| Sponsored Links |
|
|
|
|||||
|
I'm assuming that you want this to be dynamic enough to be able to handle "any" pattern fed into this template generator without having to select delimiters or anything like that. I think, in order to do that, you'll first need to define the different types of characters that a pattern could contain. You've got alphanumeric characters, special characters (!,@,#,$,%...), and other special characters (\n,\t,\r,\s...). I'm not sure what the best way would be to do this, but just off the top of my head, it seems like you need to go through record by record and keep track of the position and character type of each character in the record. Then, as you go through each row in your file compare the positions and character types to the previously tracked positions and character types.
Anyway, I've never had the opportunity to use them, but Perl's formats might be something that would help in doing something like this.
__________________
Visit My Google Group Here: Web Development Innovation |
![]() |
| Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
| Thread Tools | Search this Thread |
| Display Modes | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Template matching in Matlab? | tommy_chai | General Programming | 0 | 12-16-2007 03:36 AM |
| Template Sites | TVDinner | Website Design | 4 | 06-01-2007 10:32 PM |
| asp.net template help | streulich | ASP, ASP.NET and Coldfusion | 4 | 12-01-2006 07:55 PM |
| Xav | ........ | 161.68 |
| neerlin | ........ | 100 |
| satrian | ........ | 100 |
| delia | ........ | 100 |
| chili5 | ........ | 70.08 |
| morefood2001 | ........ | 42.41 |
| MeTh0Dz|Reb0rn | ........ | 28.44 |
| RyanTuosto | ........ | 20 |
| gamiR | ........ | 19.64 |
| John | ........ | 14.46 |
Goal: 100,000 Posts
Complete: 68%