+ Reply to Thread
Results 1 to 6 of 6

Thread: Abstract Data Types and Code Efficiency.

  1. #1
    Super Moderator WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther's Avatar
    Join Date
    Jul 2006
    Age
    37
    Posts
    13,155
    Blog Entries
    59

    Abstract Data Types and Code Efficiency.

    Abstract Data Types (ADTs) and code efficiency are two of the most important concepts when moving from basic coding in a language and beginning to create powerful applications. Many of you will never actually use these concepts directly, but if you want to be a professional programmer, you need to be aware of them. If you have used any libraries of any power (.NET, Boost, wxWidgets, Borland's GUI, Lazarus, etc), you have indirectly made use of both of these concepts.

    Code efficiency is the first topic, and it deals with how gracefully a piece of code will grow. When we write programs, we tend to create code in a very "nice" environment. We sort arrays of 10 elements, we search for items in a text files with 20 lines, etc. In cases like these, it doesn't really matter how wretched your code is, it will still complete in the blink of an eye. Unfortunately, in the real world, we don't deal with data that small, we deal with arrays with thousands of elements, and text files that contain the US tax code.

    To give an example of what I'm talking about, consider these to functions to find a value in an array of ints (that are in ascending order):
    Code:
    int badsearch(int* array, int size, int val)
    {
      for (int i=0;i<size;i++)
        if (array[i]==val) return i;
      return -1;
    }
    
    int goodsearch(int* array, int size, int val)
    {
      int i=0,j=size-1;
      while (i<=j)
      {
        if (array[(i+j)/2]==val)
          return (i+j)/2
        else if (array[(i+j)/2]<val)
          i=(i+j)/2+1
        else
          j=(i+j)/2-1
      }
      return -1;
    }
    badsearch() is the obvious way to find an element of an array, go through the array, element by element, until you find what you're looking for. When you find it, you can return its index and you're done. Here's the problem, on average, you have to go halfway through the array. If you have 10 elements, on average you'll find it on about the 5th element. That sounds pretty good, but consider this: if you have one million elements, you will find it on about the 500,000th element. What this means is that if you have X elements, it will take X/2 comparisons, on average, to find the element. While this might not seem bad, realize that every time you multiply the amount of data by 1000, the search time is also multiplied by 1000.

    goodsearch(), on the other hand, takes a different approach. It takes advantage of the array being sorted and cuts range of possibilities in half with every 2 comparisons (one to see if we found it, one to see which half it's on). What this does is allow us to double the number of elements at the cost of only 2 more comparisons. This means multiplying the number of elements by 1024 costs us an additional 2*10 (since 1024 = 2^10) comparisons. Using log_2 as log base 2, this means the worst-case number of comparisons is 2*log_2(X) for an X element array. This means that while 10 elements would take 6 comparisons, one million elements would take only 40 comparisons.

    Looking at a chart of values to see how they comparisons change as the array size grows:
    Code:
    Array Size              | 10 | 100 | 1,000 | 10,000 | 100,000 | 1,000,000
    badsearch ave if calls  |  5 |  50 |   500 |  5,000 |  50,000 |   500,000
    goodsearch max if calls |  6 |  14 |    20 |     26 |      34 |        40
    For every algorithm, we can create a function that measures the number of times a line of code gets run. badsearch() has an average number of line executions of X/2+1, where X is the number of elements in the array. goodsearch() has a worst-case number of line executions of 2*log_2(X)+1. In general, we measure how efficient an algorithm is by looking at the fastest growing term for each (X/2 for badsearch(), 2*log_2(X) for goodsearch()), and ignoring any multipliers or dividers by a constant. We use O() (read big-Oh) to represent that piece. badsearch() has O(X), goodsearch has O(log_2(x)).

    Rating these from best to worst, O(1) is best, then O(log(log(X))), O(log(X)), O(X), O(X*log(X)), O(X^2), O(X^3), O(X^n), O(e^X), O(X!). There are other values as well, but these are the most common you will see. Determining which big-Oh a function corresponds to can be challenging at best, insane at worst. Also, log doesn't care what the base is, they are all equivalent.

    When looking at Abstract Data Types, we will be looking at different options for how to make them work. What will often happen is there will be a trade-off, where one feature will get a higher big-Oh to give another feature a lower big-Oh.

    That brings up an important question: what the heck is an Abstract Data Type anyway? An Abstract Data Type (ADT), is a data structure (like an array) that has specific functionality defined, but no specification of HOW the functionality is provided. Common ADTs are: Stack, Queue, Lists, Strings, Trees, Heaps, Tables, and Graphs.

    As an example: a stack stores information like a stack of plates. You can use the "push" function to add data to the top of the stack, and you can use the "pop" function to pull the topmost piece of data from the stack. This is much like the stack of plates where you can keep adding plates on top, but if you want to get to the bottom plate, you have to take all the other plates off it (pop them off, one by one). What makes a stack abstract is the fact that you can choose most ANY method of actually storing the data. An array or a linked list are common examples. We will look at how to implement various ADTs in future tutorials.
    Last edited by WingedPanther; 12-24-2008 at 08:18 AM. Reason: fix goodsearch
    CodeCall Blog | CodeCall Wiki
    Programming is a branch of mathematics.
    My CodeCall Blog | My Personal Blog

  2. #2
    Administrator Jordan is a name known to all Jordan is a name known to all Jordan is a name known to all Jordan is a name known to all Jordan is a name known to all Jordan is a name known to all Jordan's Avatar
    Join Date
    Nov 2005
    Location
    Hendersonville, NC
    Posts
    24,750
    Blog Entries
    97

    Re: Abstract Data Types and Code Efficiency.

    Simply brilliant! I believe goodsearch() is bad though. I ran test data through it using an array of 10 (1,2,3....10) and a val of 8. I believe this is incorrect:

    Code:
        else if (array[(i+j)/2]<val)
          j=(i+j)/2-1
        else
          i=(i+j)/2+1
      }
    Should the -1 and +1 be swapped? However, I may have just not done the example correct as I was running it by hand (not in an actual program).

  3. #3
    Super Moderator WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther's Avatar
    Join Date
    Jul 2006
    Age
    37
    Posts
    13,155
    Blog Entries
    59

    Re: Abstract Data Types and Code Efficiency.

    You are half-right... the entire lines should be swapped. Fixed above.
    CodeCall Blog | CodeCall Wiki
    Programming is a branch of mathematics.
    My CodeCall Blog | My Personal Blog

  4. #4
    Code Slinger chili5 has a reputation beyond repute chili5 has a reputation beyond repute chili5 has a reputation beyond repute chili5 has a reputation beyond repute chili5 has a reputation beyond repute chili5 has a reputation beyond repute chili5 has a reputation beyond repute chili5 has a reputation beyond repute chili5 has a reputation beyond repute chili5 has a reputation beyond repute chili5 has a reputation beyond repute chili5's Avatar
    Join Date
    Mar 2008
    Posts
    7,042

    Re: Abstract Data Types and Code Efficiency.

    Interesting read.

    What does this mean?

    O(log_2(x))
    What search algorithm is that? Perhaps write a tutorial on different search algorithms?
    "Whenever you remember, I'll be there/
    Remember how we reached that dream together" - Carrie Underwood

  5. #5
    Administrator Jordan is a name known to all Jordan is a name known to all Jordan is a name known to all Jordan is a name known to all Jordan is a name known to all Jordan is a name known to all Jordan's Avatar
    Join Date
    Nov 2005
    Location
    Hendersonville, NC
    Posts
    24,750
    Blog Entries
    97

    Re: Abstract Data Types and Code Efficiency.

    He mentions:

    We use O() (read big-Oh)
    but I'm not sure what it is nor did I do as he suggested thus I agree with chili5, a tutorial would be great. lol

  6. #6
    Super Moderator WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther's Avatar
    Join Date
    Jul 2006
    Age
    37
    Posts
    13,155
    Blog Entries
    59

    Re: Abstract Data Types and Code Efficiency.

    O(log_2(X)) is the search efficiency for a balanced binary tree. Roughly, here's how the various efficiencies work:

    Code:
    X         | 1 |   5 |      10 |                   20 |     100 |    1000 |     10000
    O(1)      | 1 |   1 |       1 |                    1 |       1 |       1 |         1
    O(log(X)) | 0 | .70 |       1 |                 1.30 |       2 |       3 |         4
    O(X)      | 1 |   5 |      10 |                   20 |     100 |    1000 |     10000
    O(X^2)    | 1 |  25 |     100 |                  400 |   10000 | 1000000 | 100000000
    O(X^3)    | 1 | 125 |    1000 |                 8000 | 1000000
    O(2^X)    | 1 |  32 |    1024 |              1048576
    O(X!)     | 1 | 120 | 3628800 | ~2432902008176640000
    If X is the size of the problem (records in a database table, number of elements in an array, lines of text in a file, etc), then the numbers under it are a measure of the processing time (in lines of code, milliseconds, seconds, etc). The purpose isn't to provide precise measurements, but to show how rapidly the problem explodes.
    CodeCall Blog | CodeCall Wiki
    Programming is a branch of mathematics.
    My CodeCall Blog | My Personal Blog

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

     

Similar Threads

  1. Replies: 3
    Last Post: 10-01-2009, 11:48 AM
  2. Replies: 2
    Last Post: 11-18-2008, 07:51 AM
  3. Data types of PL/SQL and SQL
    By Patrick in forum Database & Database Programming
    Replies: 0
    Last Post: 10-06-2007, 11:52 PM
  4. Abstract data types in C++
    By priorityone in forum C and C++
    Replies: 1
    Last Post: 01-08-2007, 09:43 AM