Jump to content


Check out our Community Blogs

Register and join over 40,000 other developers!


Recent Status Updates

View All Updates

Photo
- - - - -

Searching and Inserting: Arrays vs Linked Lists

linked list array

  • Please log in to reply
16 replies to this topic

#1 WingedPanther73

WingedPanther73

    A spammer's worst nightmare

  • Moderator
  • 17757 posts
  • Location:Upstate, South Carolina
  • Programming Language:C, C++, PL/SQL, Delphi/Object Pascal, Pascal, Transact-SQL, Others
  • Learning:Java, C#, PHP, JavaScript, Lisp, Fortran, Haskell, Others

Posted 11 January 2009 - 03:31 PM

In previous tutorials we talked about arrays and linked lists as mechanisms for storing list data. In this tutorial, we'll look at some properties of arrays and linked lists, and try to find a way to get the best of each structure.

There are two basic activities that will introduce challenges to the usefulness of arrays and linked lists: searching and inserting data. We'll talk about searching first, as it will motivate the need to carefully insert data.

Let's say you have a list of numbers: 1, 3, 5, 7, 9, 11, 13, 15, 17, 19. A typical activity is based on the desire to search that list for a particular value that may or may not be in the list and return its location. If I asked you to find the location of '9', for example, you would probably scan across the list of values until you spotted it, and announce "It's right there!" You might even say "It's the sixth number on the list!" This is precisely the strategy of searching that is used in a linked list.

         +---------+   +---------+   +---------+   +---------+   +---------+   
head --> | 1, next---> | 3, next---> | 5, next---> | 7, next---> | 9, next---> ...
         +---------+   +---------+   +---------+   +---------+   +---------+   

             ^             ^             ^             ^             ^
             |             |             |             |             |
           search 1      search 2      search 3      search 4      found on 5

The code is:
linkedsearch.cpp:
struct node
{
  int data;
  node* next;
};

node* find(node* head,int tofind)
{
  node* temp=head;
  while (temp->data < tofind) 
  {
    temp=temp->next;
  }
  if (temp->data == tofind)
    return temp;
  else
    return NULL;
}

As you can see, we look through the list for the value and return either a pointer to it, or a null pointer to indicate that it wasn't found (when we find a larger value). Since, on average, our value will be in the middle of the list, it will take an average of n/2+1 comparisons (+1 for the final check of = or >) to get our result. This is O(n). To see the average search times, look at the following chart:
list size:    10    100   1,000   10,000   100,000   1,000,000
ave. comps.:   6     51     501    5,001    50,001     500,001

What O(n) tells us is this: once the number of elements is large enough, multiplying the number of elements by y multiplies your metric (in this case average comparisons) by y as well (when focused on significant digits). This introduces an issue for us: while this is good, it also kind of sucks. Consider writing a database application: you have your application and you test it with 1000 records and find that your search for records averages 500 ms. You then release your application to a customer who adds 10,000,000 records and finds that searches are taking 5,000,000 ms (or 5,000 s, or 1 hr, 23 min, 20 s). I can pretty much guarantee that you would not have that customer for long. There has to be a better way to search... and there is.

Look at the list of numbers again: 1, 3, 5, 7, 9, 11, 13, 15, 17, 19. Since it's in order, I can approach the problem from a different perspective: look at the middle value (9 in this case) Is it the value we're looking for? Is it lower than what we're looking for? If it isn't the value, then we can focus our search on the top or bottom half of our list. 2 comparisons throw out half the list. Repeating the process will result in every 2 comparisons chopping the potential chunk of list in half. With arrays, we can find the middle of our list easily. Below is the logic to find 13:

Index bounds are 0 - 9, midpoint is index 4
+----++----++----++----++----++----++----++----++----++----+
|  1 ||  3 ||  5 ||  7 ||  9 || 11 || 13 || 15 || 17 || 19 |
+----++----++----++----++----++----++----++----++----++----+
   0     1     2     3    *4*    5     6     7     8     9

9<13 so index bounds are 5-9, midpoint is index 7
+----++----++----++----++----+
| 11 || 13 || 15 || 17 || 19 |
+----++----++----++----++----+
   5     6    *7*    8     9

15>13 so index bounds are 5-6, midpoint is index 5
+----++----+
| 11 || 13 |
+----++----+
  *5*    6

11<13 so index bounds are 6-6, midpoint is index 6
+----+
| 13 |
+----+
  *6*

found at index 6!

The resulting code is:
arraysearch.cpp:
int find(int* searcharray, int tofind, int size)
{
  int min=0;
  while (min <= size)
  {
    if(searcharray[(min+size)/2]=tofind)
    {
      return (min+size)/2;
    }
    else if(searcharray[(min+size)/2]<tofind)
    {
      min=(min+size)/2+1;
    }
    else
    {
      size=(min+size)/2-1;
    }
  }
  return -1;
}

This search has a worst case scenario of approximately 2*log_2(n) (there may be a + or – 1 in there, but I'm feeling a bit lazy). To see the worst case search times, look at the following chart:
list size:    10    100   1,000   10,000   100,000   1,000,000
ave. comps.:   7     13      20       27        33          40

This is a search that is O(log(n)). What it tells us is that, once the number of elements is large enough, multiplying the number of elements by y adds z to your metric. Here, multiplying by 10 adds about 6.6 comparisons. Repeating the above scenario, you have your application and you test it with 1000 records and find that your search for records averages 500 ms. You then release your application to a customer who adds 10,000,000 records and finds that searches are taking 1175 ms (or 1s 175ms. At that point, your customer may notice that it's "a little slower" than the demo you gave, "but still pretty good".

Based on what we've seen so far, this clearly indicates that arrays are far better than linked lists, and we should immediately stop using linked lists, right? Not so fast! There was a requirement on that array: it had to contain the elements in order! That doesn't just happen by magic. Every time you add a new element to your list of data, there is an overhead cost. If you are not willing to accept that overhead cost, your data will not be sorted, and an array cannot pull that neat trick. An array search would then be just as slow as a linked list search: go through all the data and hope you find it early.

There are two ways to deal with this situation. 1) sort the data before doing a search, or 2) make sure the data remains sorted as you add new values. Since doing a sort is an expensive operation ( O(n*log(n)) or worse, depending on algorithm ), we're going to opt for keeping our data sorted as we add values. This means we need to look at how expensive it will be to insert an item. To keep things simple, we'll just look at the cost of insertion, and remember that there is always going to be an overhead cost (finding where the data goes). The total cost on an insert will be [search cost] + [insert cost], which we will deal with after we look at the cost of an insert.

A linked list is pretty simple, just adjust the pointers (in this case to add 6):
this:
                 +---------+   +---------+   
head --> ... --> | 5, next---> | 7, next---> ...
                 +---------+   +---------+   

                          +---------+
                          | 6, next---> NULL
                          +---------+


becomes:
                 +---------+   +---------+   
head --> ... --> | 5, next |   | 7, next---> ...
                 +-----|---+   +---------+   
                       |               ^
                       |   +---------+ |
                       +-> | 6, next---+
                           +---------+

Here's the code to insert data into a linked list:
linkedinsert.cpp:
node* insert(node* head, int toadd)
{
  node* temp;
  if (head->data < toadd)
  {
    temp=new node;
    temp->data = toadd;
    temp->next = head;
    return temp;
  }
  else
  {
    while ((temp->next!=NULL)&&((temp->next)->data < toadd))
    {
      temp=temp->next;
    }
    if (temp->data < toadd)
    {
      temp->next = new node;
      temp=temp->next;
      temp->data = toadd;
    }
    else
    {
      node* temp2=new node;
      temp2->data = toadd;
      temp2->next = temp->next;
      temp->next = temp2;
      temp=temp2;
    }
    return temp;
  }
}

The cost of this operation is very low. 2 pointer assignments do the job. By contrast, to insert into an array you have to move elements out of the way, like this:
Insert 6 at index 3
+----++----++----++----++----++----++----++----++----++----++----+
|  1 ||  3 ||  5 ||  7 ||  9 || 11 || 13 || 15 || 17 || 19 ||    |
+----++----++----++----++----++----++----++----++----++----++----+
   0     1     2     3     4     5     6     7     8     9    10

move elements to the right to free space
+----++----++----++----++----++----++----++----++----++----++----+
|  1 ||  3 ||  5 ||  7 ||  9 || 11 || 13 || 15 || 17 || 19 || 19 |
+----++----++----++----++----++----++----++----++----++----++----+
   0     1     2     3     4     5     6     7     8     9    10

+----++----++----++----++----++----++----++----++----++----++----+
|  1 ||  3 ||  5 ||  7 ||  9 || 11 || 13 || 15 || 17 || 17 || 19 |
+----++----++----++----++----++----++----++----++----++----++----+
   0     1     2     3     4     5     6     7     8     9    10

+----++----++----++----++----++----++----++----++----++----++----+
|  1 ||  3 ||  5 ||  7 ||  9 || 11 || 13 || 15 || 15 || 17 || 19 |
+----++----++----++----++----++----++----++----++----++----++----+
   0     1     2     3     4     5     6     7     8     9    10

+----++----++----++----++----++----++----++----++----++----++----+
|  1 ||  3 ||  5 ||  7 ||  9 || 11 || 13 || 13 || 15 || 17 || 19 |
+----++----++----++----++----++----++----++----++----++----++----+
   0     1     2     3     4     5     6     7     8     9    10

+----++----++----++----++----++----++----++----++----++----++----+
|  1 ||  3 ||  5 ||  7 ||  9 || 11 || 11 || 13 || 15 || 17 || 19 |
+----++----++----++----++----++----++----++----++----++----++----+
   0     1     2     3     4     5     6     7     8     9    10

+----++----++----++----++----++----++----++----++----++----++----+
|  1 ||  3 ||  5 ||  7 ||  9 ||  9 || 11 || 13 || 15 || 17 || 19 |
+----++----++----++----++----++----++----++----++----++----++----+
   0     1     2     3     4     5     6     7     8     9    10

+----++----++----++----++----++----++----++----++----++----++----+
|  1 ||  3 ||  5 ||  7 ||  7 ||  9 || 11 || 13 || 15 || 17 || 19 |
+----++----++----++----++----++----++----++----++----++----++----+
   0     1     2     3     4     5     6     7     8     9    10

and now add 6 at index 3
+----++----++----++----++----++----++----++----++----++----++----+
|  1 ||  3 ||  5 ||  6 ||  7 ||  9 || 11 || 13 || 15 || 17 || 19 |
+----++----++----++----++----++----++----++----++----++----++----+
   0     1     2     3     4     5     6     7     8     9    10

The code is:
arrayinsert.cpp:
int insert(int* searcharray, int toadd, int size)
{
  int min=0;
  int max=size;
  while (min <= max)
  {
    if(searcharray[(min+max)/2]=toadd)
    {
      min = (min+max)/2;
      for(int i=size;i>min;i--)
      {
        searcharray[i+1]=searcharray[i];
      }
      searcharray[min]=toadd;
      return min;
    }
    else if(searcharray[(min+max)/2]<toadd)
    {
      min=(min+max)/2+1;
    }
    else
    {
      max=(min+max)/2-1;
    }
  }
  min=(min+max)/2;
  if (searcharray[min+1]<toadd)
  {
    min=min+1;
  }
  for(int i=size;i>min;i--)
  {
    searcharray[i+1]=searcharray[i];
  }
  searcharray[min]=toadd;
  return min;
}

Here, the big problem is that we have to move elements out of the way of the new one. The cost of the operation depends on the size of the array and the position in the array, but on average it is about n/2 moves of data. At this point, it is worth noting that these two operations are not easy to compare (as simple as it looked). The reason is that a pointer typically occupies about 4 bytes of data. The result is that assigning two pointers is not just 2 operations, it is 2 cheap operations. By contrast, moving the actual data of interest can be quite expensive. Imagine if each element of the array is a class containing 1 MB of data! If you only have to move 5 elements of the array, that is 5MB of data, compared to 8 B.

The total cost of search plus insert becomes: linked list = n/2 + 1 + 2, array = 2*log_2(n) + (n/2)*(cost of move). Both are O(n) for the total insert, with array being potentially much slower depending on the type of data involved. Based on these two methods for storing data, it looks like you should use arrays if you are mainly searching for data that is small, and you should use linked lists if you have a lot of inserts or the data is large.

In the next tutorial, we'll try to get the best of both worlds :)
  • 3

Programming is a branch of mathematics.
My CodeCall Blog | My Personal Blog

My MineCraft server site: http://banishedwings.enjin.com/


#2 MathX

MathX

    CC Leader

  • Expert Member
  • PipPipPipPipPipPipPip
  • 1038 posts
  • Programming Language:Java

Posted 12 January 2009 - 02:31 AM

OMG, U r great!!!. Perfectly done :D
  • 0

Interested in participating in community events?
Want to harness your programming skill and turn it into absolute prowess?
Come join our programming events!


#3 Guest_Jordan_*

Guest_Jordan_*
  • Guest

Posted 12 January 2009 - 05:59 AM

Great tutorial, packed full of info! I'm going to submit this one to the tutorial sites.
  • 0

#4 outsid3r

outsid3r

    CC Devotee

  • Just Joined
  • PipPipPipPipPipPip
  • 494 posts

Posted 30 August 2009 - 04:40 AM

I learned something new with this tutorial, it's really very descriptive, great job :).
I just don't like to use NULL macro, i prefer just 0, and many people think that NULL is something of the past and should not be used in C++.
Anyway in the new C++ standard NULL will disappear and a new keyword will be available, called "nullptr", which represents 0, but it can't be used on integers, just strictly on pointers.
  • 0

#5 WingedPanther73

WingedPanther73

    A spammer's worst nightmare

  • Moderator
  • 17757 posts
  • Location:Upstate, South Carolina
  • Programming Language:C, C++, PL/SQL, Delphi/Object Pascal, Pascal, Transact-SQL, Others
  • Learning:Java, C#, PHP, JavaScript, Lisp, Fortran, Haskell, Others

Posted 31 August 2009 - 05:05 AM

Thanks much :)
  • 0

Programming is a branch of mathematics.
My CodeCall Blog | My Personal Blog

My MineCraft server site: http://banishedwings.enjin.com/


#6 veda87

veda87

    CC Addict

  • Advanced Member
  • PipPipPipPipPip
  • 126 posts

Posted 17 September 2009 - 07:46 AM

thanks for this tutorial...
and when i tried this, I found this small mistake...

In

The code is:
arrayinsert.cpp:

Instead of
if(searcharray[(min+max)/2]=toadd)
    {
    }
it should be
if(searcharray[(min+max)/2]==toadd)
     {
     }

Even in

arraysearch.cpp

Instead of
if(searcharray[(min+size)/2]=tofind)
it should be
if(searcharray[(min+size)/2]==tofind)

  • 0

#7 WingedPanther73

WingedPanther73

    A spammer's worst nightmare

  • Moderator
  • 17757 posts
  • Location:Upstate, South Carolina
  • Programming Language:C, C++, PL/SQL, Delphi/Object Pascal, Pascal, Transact-SQL, Others
  • Learning:Java, C#, PHP, JavaScript, Lisp, Fortran, Haskell, Others

Posted 17 September 2009 - 08:01 PM

Good catches! Thanks :)
  • 0

Programming is a branch of mathematics.
My CodeCall Blog | My Personal Blog

My MineCraft server site: http://banishedwings.enjin.com/


#8 TkTech

TkTech

    The Crazy One

  • Retired Mod
  • PipPipPipPipPipPipPip
  • 1144 posts
  • Location:Ottawa, Ontario

Posted 19 September 2009 - 11:42 AM

@outsid3r: NULL isn't guaranteed to be zero :) It just 'is' by convention. And there's nothing wrong with NULL, a concept of nothing always has to exist. It just poor programmers with poor checking that make it a faulty concept. **, MIPS has a $zero ($0) register just to express nothing.

@Winged: Nice tut, +Rep. We going to see a series on Tries or something a bit more complex? I'd love to see your implementation of a Patricia trie. We use one in pedigree for the VFS cache, but with your logic and math skills it would probably be much more efficient :D
  • 0
Helpful CODECALL Links: Join Us, Guidelines, FAQ, Post a Tutorial

#9 outsid3r

outsid3r

    CC Devotee

  • Just Joined
  • PipPipPipPipPipPip
  • 494 posts

Posted 19 September 2009 - 11:47 AM

I didn't said that there is anything wrong with NULL, i just don't like it at all, to express a pointer to nothing i just prefer 0, it's more simple.

What bjarne has to say about null (which i agree at all):

In C++, the definition of NULL is 0, so there is only an aesthetic difference. I prefer to avoid macros, so I use 0. Another problem with NULL is that people sometimes mistakenly believe that it is different from 0 and/or not an integer. In pre-standard code, NULL was/is sometimes defined to something unsuitable and therefore had/has to be avoided. That's less common these days.

If you have to name the null pointer, call it nullptr; that's what it's going to be called in C++0x. Then, "nullptr" will be a keyword.


  • 0

#10 TkTech

TkTech

    The Crazy One

  • Retired Mod
  • PipPipPipPipPipPipPip
  • 1144 posts
  • Location:Ottawa, Ontario

Posted 19 September 2009 - 11:51 AM

I'm sorry, a bit aggravated this morning :) I'm used to the most portable code possible, and that includes common pre-standards.

If bjarne sees this, I'd like to know what he has against macros (In PM of course, off topic here)
  • 0
Helpful CODECALL Links: Join Us, Guidelines, FAQ, Post a Tutorial

#11 WingedPanther73

WingedPanther73

    A spammer's worst nightmare

  • Moderator
  • 17757 posts
  • Location:Upstate, South Carolina
  • Programming Language:C, C++, PL/SQL, Delphi/Object Pascal, Pascal, Transact-SQL, Others
  • Learning:Java, C#, PHP, JavaScript, Lisp, Fortran, Haskell, Others

Posted 19 September 2009 - 01:52 PM

His issue with macros is pretty simple: they can avoid type checking, which makes them far less safe. That said, he strongly encourages #define for things like include guards. C++ was designed to make macros almost completely unnecessary, which is my biggest problem with wxWidgets.
  • 0

Programming is a branch of mathematics.
My CodeCall Blog | My Personal Blog

My MineCraft server site: http://banishedwings.enjin.com/


#12 outsid3r

outsid3r

    CC Devotee

  • Just Joined
  • PipPipPipPipPipPip
  • 494 posts

Posted 19 September 2009 - 03:33 PM

Well said, that's exactly where C++ aims.
  • 0





Also tagged with one or more of these keywords: linked list, array

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download