Jump to content

Efficient Searching

- - - - -

This topic has been archived. This means that you cannot reply to this topic.
3 replies to this topic

#1
John

John

    Writes binary right handed and hex left handed

  • Moderators
  • 6,321 posts
Well I have a dictionary file (dictionary.txt) which contains several thousand words each on its own line. My question is, what is the most efficient way to search through the file?

Essentially I have a method that passes in a word string, and I need to check whether the string that is passed in is valid or not.

As of right now, I have a very simple (and probably very non efficient) iterative search. Essentially it is a loop that checks each line of the file and compares it to the string that was passed in. If there is a match, the loop breaks and returns true. If it is false (it checks every line in the file for nothing) and returns false.

I took a quick glance in one of my text books and found that a binary search tree looks like it would be much more efficient.

I was wondering if someone could give a brief overview of searching efficiency, in terms of searching sorted lists and unsorted lists using iterations, recursion or any other method.

#2
WingedPanther

WingedPanther

    A spammer's worst nightmare

  • Moderators
  • 16,831 posts
It depends on the dictionary file. If the words are in alphabetical order, then a binary search could work very well. If you have an unsorted list, you will have to determine if the overhead of sorting the list will be worth the gains in your searching.

Briefly, to do an iterative search is O(n), a binary search is O(ln(n)), but sorting can be O(n ln(n)) or higher.
Programming is a branch of mathematics.
My CodeCall Blog | My Personal Blog

#3
John

John

    Writes binary right handed and hex left handed

  • Moderators
  • 6,321 posts
After looking into the matter, a BST would be just as inefficient as a linear search. Essentially, BST's gain their advantage because their depth is short making say, a search with a million elements just a matter of 20 searches. However using a generic BST everytime you add another element to the tree it will add it to the "right" of the previous element because it is "bigger" than the previous. For example say the dictionary file contains the numbers 1-5. When the tree is created it would look like


1

  \

    2

      \

       3

         \

          4

            \

             5

Which is a depth of 5, which would yield the same efficency as a linear search - having to search through all 5 elements for the worst case. Ideally the the best tree would look like:


         3

        / \

       2   4

      /      \

    1         5

With a depth of three, and only need to search three times. However, using a dictionary file along with a generic BST wouldnt create the ideal tree. You could of course create an abstract BST to make it efficient, but after research, for a dictionary, a Trie would work the best.

#4
WingedPanther

WingedPanther

    A spammer's worst nightmare

  • Moderators
  • 16,831 posts
That, of course, is the problem with trees. You have to periodicly invest the overhead to balance it, which tends to be an expensive process. I guess it's a question of what's more important and more common, searching or modifying the contents.
Programming is a branch of mathematics.
My CodeCall Blog | My Personal Blog