Jump to content

String searching algorithm

- - - - -

This topic has been archived. This means that you cannot reply to this topic.
4 replies to this topic

#1
DarkLordoftheMonkeys

DarkLordoftheMonkeys

    Programming Professional

  • Members
  • PipPipPipPipPip
  • 255 posts
Don't read this thread. The scripts are crap and they don't work properly. I've put the fixed version here:
http://forum.codecal...-algorithm.html


This is the first step in a regular expression parser that I'm in the process of writing. It's going to take a while, but I've got a straight string search done. The next step will be to build a Knuth-Morris-Pratt search and a Boyar-Moore search based on this one (for the purpose of speed optimization), then figure out how to parse metacharacters in the regular expression. The script takes two inputs: $1 is the substring to search for and $2 is the string to search. The output of the script is the beginning index of the substring. It goes like this:

#!/bin/bash
# straight string search algorithm
declare -i index=-1;
for((i=0; i < ${#2}; i++))
do
	for((j=0; j < ${#1}; j++))
	# loop for individual substring comparisons
	do
		if [ "${1:j:1}" = "${2:j+i:1}" ]
		# index is set to j+i, for obvious reasons.
		then
			index=i;
		else
			break;
		fi
	done
	if [ $index -ne -1 ]
	then
		break;
		# close to the Knuth-Morris-Pratt method, only it doesn't jump ahead
	fi
done
echo $index;
unset index;

Edited by DarkLordoftheMonkeys, 12 November 2009 - 01:42 PM.

Life's too short to be cool. Be a nerd.

#2
DarkLordoftheMonkeys

DarkLordoftheMonkeys

    Programming Professional

  • Members
  • PipPipPipPipPip
  • 255 posts
Update:
I just modified the algorithm so it does a Knuth-Morris-Pratt search instead of a straight string search. This should make it faster as it has to do fewer comparisons. I only had to add one line of code.


#!/bin/bash

# straight string search algorithms

declare -i index=-1;

for((i=0; i < ${#2}; i++))

do

	for((j=0; j < ${#1}; j++))

	# loop for individual substring comparisons

	do

		if [ "${1:j:1}" = "${2:j+i:1}" ]

		# index is set to j+i, for obvious reasons.

		then

			index=i;

		else

			break;

		fi

	done

	if [ $index -ne -1 ]

	then

		i+=j;

		break;

		# Knuth-Morris-Pratt method

	fi

done

echo $index;

unset index;


Life's too short to be cool. Be a nerd.

#3
DarkLordoftheMonkeys

DarkLordoftheMonkeys

    Programming Professional

  • Members
  • PipPipPipPipPip
  • 255 posts
Here's the Boyar-Moore version of the string search. This one does it even faster.


#!/bin/bash

# Boyar-Moore string search

declare -i index=-1;

for((i=0; i < ${#2}; i++))

do

	for((j=${#1}-1; j >= 0; j--))

	# loop for individual substring comparisons

	do

		if [ "${1:j:1}" = "${2:i+j:1}" ]

		# compares backward

		then

			index=i;

		else

			break;

		fi

	done

	if [ $index -ne -1 ]

	then

		i+=${#1};

		# skips ahead one whole substring length

		break;

	fi

done

echo $index;

unset index;


Life's too short to be cool. Be a nerd.

#4
DarkLordoftheMonkeys

DarkLordoftheMonkeys

    Programming Professional

  • Members
  • PipPipPipPipPip
  • 255 posts
I found a major bug in the script. I'm trying to fix it right now.
Life's too short to be cool. Be a nerd.

#5
DarkLordoftheMonkeys

DarkLordoftheMonkeys

    Programming Professional

  • Members
  • PipPipPipPipPip
  • 255 posts
Please disregard everything in this thread. The scripts are crap. I just fixed the algorithm and I'm going to post it in another thread.
Life's too short to be cool. Be a nerd.