Abstract Data Types (ADTs) and code efficiency are two of the most important concepts when moving from basic coding in a language and beginning to create powerful applications. Many of you will never actually use these concepts directly, but if you want to be a professional programmer, you need to be aware of them. If you have used any libraries of any power (.NET, Boost, wxWidgets, Borland's GUI, Lazarus, etc), you have indirectly made use of both of these concepts.
Code efficiency is the first topic, and it deals with how gracefully a piece of code will grow. When we write programs, we tend to create code in a very "nice" environment. We sort arrays of 10 elements, we search for items in a text files with 20 lines, etc. In cases like these, it doesn't really matter how wretched your code is, it will still complete in the blink of an eye. Unfortunately, in the real world, we don't deal with data that small, we deal with arrays with thousands of elements, and text files that contain the US tax code.
To give an example of what I'm talking about, consider these to functions to find a value in an array of ints (that are in ascending order):
badsearch() is the obvious way to find an element of an array, go through the array, element by element, until you find what you're looking for. When you find it, you can return its index and you're done. Here's the problem, on average, you have to go halfway through the array. If you have 10 elements, on average you'll find it on about the 5th element. That sounds pretty good, but consider this: if you have one million elements, you will find it on about the 500,000th element. What this means is that if you have X elements, it will take X/2 comparisons, on average, to find the element. While this might not seem bad, realize that every time you multiply the amount of data by 1000, the search time is also multiplied by 1000.Code:int badsearch(int* array, int size, int val) { for (int i=0;i<size;i++) if (array[i]==val) return i; return -1; } int goodsearch(int* array, int size, int val) { int i=0,j=size-1; while (i<=j) { if (array[(i+j)/2]==val) return (i+j)/2 else if (array[(i+j)/2]<val) i=(i+j)/2+1 else j=(i+j)/2-1 } return -1; }
goodsearch(), on the other hand, takes a different approach. It takes advantage of the array being sorted and cuts range of possibilities in half with every 2 comparisons (one to see if we found it, one to see which half it's on). What this does is allow us to double the number of elements at the cost of only 2 more comparisons. This means multiplying the number of elements by 1024 costs us an additional 2*10 (since 1024 = 2^10) comparisons. Using log_2 as log base 2, this means the worst-case number of comparisons is 2*log_2(X) for an X element array. This means that while 10 elements would take 6 comparisons, one million elements would take only 40 comparisons.
Looking at a chart of values to see how they comparisons change as the array size grows:
For every algorithm, we can create a function that measures the number of times a line of code gets run. badsearch() has an average number of line executions of X/2+1, where X is the number of elements in the array. goodsearch() has a worst-case number of line executions of 2*log_2(X)+1. In general, we measure how efficient an algorithm is by looking at the fastest growing term for each (X/2 for badsearch(), 2*log_2(X) for goodsearch()), and ignoring any multipliers or dividers by a constant. We use O() (read big-Oh) to represent that piece. badsearch() has O(X), goodsearch has O(log_2(x)).Code:Array Size | 10 | 100 | 1,000 | 10,000 | 100,000 | 1,000,000 badsearch ave if calls | 5 | 50 | 500 | 5,000 | 50,000 | 500,000 goodsearch max if calls | 6 | 14 | 20 | 26 | 34 | 40
Rating these from best to worst, O(1) is best, then O(log(log(X))), O(log(X)), O(X), O(X*log(X)), O(X^2), O(X^3), O(X^n), O(e^X), O(X!). There are other values as well, but these are the most common you will see. Determining which big-Oh a function corresponds to can be challenging at best, insane at worst. Also, log doesn't care what the base is, they are all equivalent.
When looking at Abstract Data Types, we will be looking at different options for how to make them work. What will often happen is there will be a trade-off, where one feature will get a higher big-Oh to give another feature a lower big-Oh.
That brings up an important question: what the heck is an Abstract Data Type anyway? An Abstract Data Type (ADT), is a data structure (like an array) that has specific functionality defined, but no specification of HOW the functionality is provided. Common ADTs are: Stack, Queue, Lists, Strings, Trees, Heaps, Tables, and Graphs.
As an example: a stack stores information like a stack of plates. You can use the "push" function to add data to the top of the stack, and you can use the "pop" function to pull the topmost piece of data from the stack. This is much like the stack of plates where you can keep adding plates on top, but if you want to get to the bottom plate, you have to take all the other plates off it (pop them off, one by one). What makes a stack abstract is the fact that you can choose most ANY method of actually storing the data. An array or a linked list are common examples. We will look at how to implement various ADTs in future tutorials.
Last edited by WingedPanther; 12-24-2008 at 08:18 AM. Reason: fix goodsearch
Simply brilliant! I believe goodsearch() is bad though. I ran test data through it using an array of 10 (1,2,3....10) and a val of 8. I believe this is incorrect:
Should the -1 and +1 be swapped? However, I may have just not done the example correct as I was running it by hand (not in an actual program).Code:else if (array[(i+j)/2]<val) j=(i+j)/2-1 else i=(i+j)/2+1 }
You are half-right... the entire lines should be swapped. Fixed above.
Interesting read.![]()
What does this mean?
What search algorithm is that? Perhaps write a tutorial on different search algorithms?O(log_2(x))
He mentions:
but I'm not sure what it is nor did I do as he suggested thus I agree with chili5, a tutorial would be great. lolWe use O() (read big-Oh)
O(log_2(X)) is the search efficiency for a balanced binary tree. Roughly, here's how the various efficiencies work:
If X is the size of the problem (records in a database table, number of elements in an array, lines of text in a file, etc), then the numbers under it are a measure of the processing time (in lines of code, milliseconds, seconds, etc). The purpose isn't to provide precise measurements, but to show how rapidly the problem explodes.Code:X | 1 | 5 | 10 | 20 | 100 | 1000 | 10000 O(1) | 1 | 1 | 1 | 1 | 1 | 1 | 1 O(log(X)) | 0 | .70 | 1 | 1.30 | 2 | 3 | 4 O(X) | 1 | 5 | 10 | 20 | 100 | 1000 | 10000 O(X^2) | 1 | 25 | 100 | 400 | 10000 | 1000000 | 100000000 O(X^3) | 1 | 125 | 1000 | 8000 | 1000000 O(2^X) | 1 | 32 | 1024 | 1048576 O(X!) | 1 | 120 | 3628800 | ~2432902008176640000
There are currently 1 users browsing this thread. (0 members and 1 guests)
Bookmarks