+ Reply to Thread
Page 1 of 2
1 2 LastLast
Results 1 to 10 of 17

Thread: Searching and Inserting: Arrays vs Linked Lists

  1. #1
    Super Moderator WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther's Avatar
    Join Date
    Jul 2006
    Age
    37
    Posts
    13,155
    Blog Entries
    59

    Searching and Inserting: Arrays vs Linked Lists

    In previous tutorials we talked about arrays and linked lists as mechanisms for storing list data. In this tutorial, we'll look at some properties of arrays and linked lists, and try to find a way to get the best of each structure.

    There are two basic activities that will introduce challenges to the usefulness of arrays and linked lists: searching and inserting data. We'll talk about searching first, as it will motivate the need to carefully insert data.

    Let's say you have a list of numbers: 1, 3, 5, 7, 9, 11, 13, 15, 17, 19. A typical activity is based on the desire to search that list for a particular value that may or may not be in the list and return its location. If I asked you to find the location of '9', for example, you would probably scan across the list of values until you spotted it, and announce "It's right there!" You might even say "It's the sixth number on the list!" This is precisely the strategy of searching that is used in a linked list.

    Code:
             +---------+   +---------+   +---------+   +---------+   +---------+   
    head --> | 1, next---> | 3, next---> | 5, next---> | 7, next---> | 9, next---> ...
             +---------+   +---------+   +---------+   +---------+   +---------+   
    
                 ^             ^             ^             ^             ^
                 |             |             |             |             |
               search 1      search 2      search 3      search 4      found on 5
    The code is:
    linkedsearch.cpp:
    Code:
    struct node
    {
      int data;
      node* next;
    };
    
    node* find(node* head,int tofind)
    {
      node* temp=head;
      while (temp->data < tofind) 
      {
        temp=temp->next;
      }
      if (temp->data == tofind)
        return temp;
      else
        return NULL;
    }
    As you can see, we look through the list for the value and return either a pointer to it, or a null pointer to indicate that it wasn't found (when we find a larger value). Since, on average, our value will be in the middle of the list, it will take an average of n/2+1 comparisons (+1 for the final check of = or >) to get our result. This is O(n). To see the average search times, look at the following chart:
    Code:
    list size:    10    100   1,000   10,000   100,000   1,000,000
    ave. comps.:   6     51     501    5,001    50,001     500,001
    What O(n) tells us is this: once the number of elements is large enough, multiplying the number of elements by y multiplies your metric (in this case average comparisons) by y as well (when focused on significant digits). This introduces an issue for us: while this is good, it also kind of sucks. Consider writing a database application: you have your application and you test it with 1000 records and find that your search for records averages 500 ms. You then release your application to a customer who adds 10,000,000 records and finds that searches are taking 5,000,000 ms (or 5,000 s, or 1 hr, 23 min, 20 s). I can pretty much guarantee that you would not have that customer for long. There has to be a better way to search... and there is.

    Look at the list of numbers again: 1, 3, 5, 7, 9, 11, 13, 15, 17, 19. Since it's in order, I can approach the problem from a different perspective: look at the middle value (9 in this case) Is it the value we're looking for? Is it lower than what we're looking for? If it isn't the value, then we can focus our search on the top or bottom half of our list. 2 comparisons throw out half the list. Repeating the process will result in every 2 comparisons chopping the potential chunk of list in half. With arrays, we can find the middle of our list easily. Below is the logic to find 13:

    Code:
    Index bounds are 0 - 9, midpoint is index 4
    +----++----++----++----++----++----++----++----++----++----+
    |  1 ||  3 ||  5 ||  7 ||  9 || 11 || 13 || 15 || 17 || 19 |
    +----++----++----++----++----++----++----++----++----++----+
       0     1     2     3    *4*    5     6     7     8     9
    
    9<13 so index bounds are 5-9, midpoint is index 7
    +----++----++----++----++----+
    | 11 || 13 || 15 || 17 || 19 |
    +----++----++----++----++----+
       5     6    *7*    8     9
    
    15>13 so index bounds are 5-6, midpoint is index 5
    +----++----+
    | 11 || 13 |
    +----++----+
      *5*    6
    
    11<13 so index bounds are 6-6, midpoint is index 6
    +----+
    | 13 |
    +----+
      *6*
    
    found at index 6!
    The resulting code is:
    arraysearch.cpp:
    Code:
    int find(int* searcharray, int tofind, int size)
    {
      int min=0;
      while (min <= size)
      {
        if(searcharray[(min+size)/2]=tofind)
        {
          return (min+size)/2;
        }
        else if(searcharray[(min+size)/2]<tofind)
        {
          min=(min+size)/2+1;
        }
        else
        {
          size=(min+size)/2-1;
        }
      }
      return -1;
    }
    This search has a worst case scenario of approximately 2*log_2(n) (there may be a + or – 1 in there, but I'm feeling a bit lazy). To see the worst case search times, look at the following chart:
    Code:
    list size:    10    100   1,000   10,000   100,000   1,000,000
    ave. comps.:   7     13      20       27        33          40
    This is a search that is O(log(n)). What it tells us is that, once the number of elements is large enough, multiplying the number of elements by y adds z to your metric. Here, multiplying by 10 adds about 6.6 comparisons. Repeating the above scenario, you have your application and you test it with 1000 records and find that your search for records averages 500 ms. You then release your application to a customer who adds 10,000,000 records and finds that searches are taking 1175 ms (or 1s 175ms. At that point, your customer may notice that it's "a little slower" than the demo you gave, "but still pretty good".

    Based on what we've seen so far, this clearly indicates that arrays are far better than linked lists, and we should immediately stop using linked lists, right? Not so fast! There was a requirement on that array: it had to contain the elements in order! That doesn't just happen by magic. Every time you add a new element to your list of data, there is an overhead cost. If you are not willing to accept that overhead cost, your data will not be sorted, and an array cannot pull that neat trick. An array search would then be just as slow as a linked list search: go through all the data and hope you find it early.

    There are two ways to deal with this situation. 1) sort the data before doing a search, or 2) make sure the data remains sorted as you add new values. Since doing a sort is an expensive operation ( O(n*log(n)) or worse, depending on algorithm ), we're going to opt for keeping our data sorted as we add values. This means we need to look at how expensive it will be to insert an item. To keep things simple, we'll just look at the cost of insertion, and remember that there is always going to be an overhead cost (finding where the data goes). The total cost on an insert will be [search cost] + [insert cost], which we will deal with after we look at the cost of an insert.

    A linked list is pretty simple, just adjust the pointers (in this case to add 6):
    Code:
    this:
                     +---------+   +---------+   
    head --> ... --> | 5, next---> | 7, next---> ...
                     +---------+   +---------+   
    
                              +---------+
                              | 6, next---> NULL
                              +---------+
    
    
    becomes:
                     +---------+   +---------+   
    head --> ... --> | 5, next |   | 7, next---> ...
                     +-----|---+   +---------+   
                           |               ^
                           |   +---------+ |
                           +-> | 6, next---+
                               +---------+
    Here's the code to insert data into a linked list:
    linkedinsert.cpp:
    Code:
    node* insert(node* head, int toadd)
    {
      node* temp;
      if (head->data < toadd)
      {
        temp=new node;
        temp->data = toadd;
        temp->next = head;
        return temp;
      }
      else
      {
        while ((temp->next!=NULL)&&((temp->next)->data < toadd))
        {
          temp=temp->next;
        }
        if (temp->data < toadd)
        {
          temp->next = new node;
          temp=temp->next;
          temp->data = toadd;
        }
        else
        {
          node* temp2=new node;
          temp2->data = toadd;
          temp2->next = temp->next;
          temp->next = temp2;
          temp=temp2;
        }
        return temp;
      }
    }
    The cost of this operation is very low. 2 pointer assignments do the job. By contrast, to insert into an array you have to move elements out of the way, like this:
    Code:
    Insert 6 at index 3
    +----++----++----++----++----++----++----++----++----++----++----+
    |  1 ||  3 ||  5 ||  7 ||  9 || 11 || 13 || 15 || 17 || 19 ||    |
    +----++----++----++----++----++----++----++----++----++----++----+
       0     1     2     3     4     5     6     7     8     9    10
    
    move elements to the right to free space
    +----++----++----++----++----++----++----++----++----++----++----+
    |  1 ||  3 ||  5 ||  7 ||  9 || 11 || 13 || 15 || 17 || 19 || 19 |
    +----++----++----++----++----++----++----++----++----++----++----+
       0     1     2     3     4     5     6     7     8     9    10
    
    +----++----++----++----++----++----++----++----++----++----++----+
    |  1 ||  3 ||  5 ||  7 ||  9 || 11 || 13 || 15 || 17 || 17 || 19 |
    +----++----++----++----++----++----++----++----++----++----++----+
       0     1     2     3     4     5     6     7     8     9    10
    
    +----++----++----++----++----++----++----++----++----++----++----+
    |  1 ||  3 ||  5 ||  7 ||  9 || 11 || 13 || 15 || 15 || 17 || 19 |
    +----++----++----++----++----++----++----++----++----++----++----+
       0     1     2     3     4     5     6     7     8     9    10
    
    +----++----++----++----++----++----++----++----++----++----++----+
    |  1 ||  3 ||  5 ||  7 ||  9 || 11 || 13 || 13 || 15 || 17 || 19 |
    +----++----++----++----++----++----++----++----++----++----++----+
       0     1     2     3     4     5     6     7     8     9    10
    
    +----++----++----++----++----++----++----++----++----++----++----+
    |  1 ||  3 ||  5 ||  7 ||  9 || 11 || 11 || 13 || 15 || 17 || 19 |
    +----++----++----++----++----++----++----++----++----++----++----+
       0     1     2     3     4     5     6     7     8     9    10
    
    +----++----++----++----++----++----++----++----++----++----++----+
    |  1 ||  3 ||  5 ||  7 ||  9 ||  9 || 11 || 13 || 15 || 17 || 19 |
    +----++----++----++----++----++----++----++----++----++----++----+
       0     1     2     3     4     5     6     7     8     9    10
    
    +----++----++----++----++----++----++----++----++----++----++----+
    |  1 ||  3 ||  5 ||  7 ||  7 ||  9 || 11 || 13 || 15 || 17 || 19 |
    +----++----++----++----++----++----++----++----++----++----++----+
       0     1     2     3     4     5     6     7     8     9    10
    
    and now add 6 at index 3
    +----++----++----++----++----++----++----++----++----++----++----+
    |  1 ||  3 ||  5 ||  6 ||  7 ||  9 || 11 || 13 || 15 || 17 || 19 |
    +----++----++----++----++----++----++----++----++----++----++----+
       0     1     2     3     4     5     6     7     8     9    10
    The code is:
    arrayinsert.cpp:
    Code:
    int insert(int* searcharray, int toadd, int size)
    {
      int min=0;
      int max=size;
      while (min <= max)
      {
        if(searcharray[(min+max)/2]=toadd)
        {
          min = (min+max)/2;
          for(int i=size;i>min;i--)
          {
            searcharray[i+1]=searcharray[i];
          }
          searcharray[min]=toadd;
          return min;
        }
        else if(searcharray[(min+max)/2]<toadd)
        {
          min=(min+max)/2+1;
        }
        else
        {
          max=(min+max)/2-1;
        }
      }
      min=(min+max)/2;
      if (searcharray[min+1]<toadd)
      {
        min=min+1;
      }
      for(int i=size;i>min;i--)
      {
        searcharray[i+1]=searcharray[i];
      }
      searcharray[min]=toadd;
      return min;
    }
    Here, the big problem is that we have to move elements out of the way of the new one. The cost of the operation depends on the size of the array and the position in the array, but on average it is about n/2 moves of data. At this point, it is worth noting that these two operations are not easy to compare (as simple as it looked). The reason is that a pointer typically occupies about 4 bytes of data. The result is that assigning two pointers is not just 2 operations, it is 2 cheap operations. By contrast, moving the actual data of interest can be quite expensive. Imagine if each element of the array is a class containing 1 MB of data! If you only have to move 5 elements of the array, that is 5MB of data, compared to 8 B.

    The total cost of search plus insert becomes: linked list = n/2 + 1 + 2, array = 2*log_2(n) + (n/2)*(cost of move). Both are O(n) for the total insert, with array being potentially much slower depending on the type of data involved. Based on these two methods for storing data, it looks like you should use arrays if you are mainly searching for data that is small, and you should use linked lists if you have a lot of inserts or the data is large.

    In the next tutorial, we'll try to get the best of both worlds
    CodeCall Blog | CodeCall Wiki
    Programming is a branch of mathematics.
    My CodeCall Blog | My Personal Blog

  2. #2
    Guru MathX has a spectacular aura about MathX has a spectacular aura about MathX's Avatar
    Join Date
    Oct 2008
    Location
    Kosovo
    Age
    19
    Posts
    4,007

    Re: Searching and Inserting: Arrays vs Linked Lists

    OMG, U r great!!!. Perfectly done

  3. #3
    Administrator Jordan is a name known to all Jordan is a name known to all Jordan is a name known to all Jordan is a name known to all Jordan is a name known to all Jordan is a name known to all Jordan's Avatar
    Join Date
    Nov 2005
    Location
    Hendersonville, NC
    Posts
    24,750
    Blog Entries
    97

    Re: Searching and Inserting: Arrays vs Linked Lists

    Great tutorial, packed full of info! I'm going to submit this one to the tutorial sites.

  4. #4
    Programming God outsid3r has a spectacular aura about outsid3r has a spectacular aura about outsid3r's Avatar
    Join Date
    Jul 2008
    Posts
    623

    Re: Searching and Inserting: Arrays vs Linked Lists

    I learned something new with this tutorial, it's really very descriptive, great job .
    I just don't like to use NULL macro, i prefer just 0, and many people think that NULL is something of the past and should not be used in C++.
    Anyway in the new C++ standard NULL will disappear and a new keyword will be available, called "nullptr", which represents 0, but it can't be used on integers, just strictly on pointers.

  5. #5
    Super Moderator WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther's Avatar
    Join Date
    Jul 2006
    Age
    37
    Posts
    13,155
    Blog Entries
    59

    Re: Searching and Inserting: Arrays vs Linked Lists

    Thanks much
    CodeCall Blog | CodeCall Wiki
    Programming is a branch of mathematics.
    My CodeCall Blog | My Personal Blog

  6. #6
    Programmer veda87 is an unknown quantity at this point veda87's Avatar
    Join Date
    Aug 2009
    Posts
    125

    Re: Searching and Inserting: Arrays vs Linked Lists

    thanks for this tutorial...
    and when i tried this, I found this small mistake...

    In
    The code is:
    arrayinsert.cpp:
    Instead of
    Code:
    if(searcharray[(min+max)/2]=toadd)
        {
        }
    it should be
    Code:
    if(searcharray[(min+max)/2]==toadd)
         {
         }
    Even in
    arraysearch.cpp
    Instead of
    Code:
    if(searcharray[(min+size)/2]=tofind)
    it should be
    Code:
    if(searcharray[(min+size)/2]==tofind)

  7. #7
    Super Moderator WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther has much to be proud of WingedPanther's Avatar
    Join Date
    Jul 2006
    Age
    37
    Posts
    13,155
    Blog Entries
    59

    Re: Searching and Inserting: Arrays vs Linked Lists

    Good catches! Thanks
    CodeCall Blog | CodeCall Wiki
    Programming is a branch of mathematics.
    My CodeCall Blog | My Personal Blog

  8. #8
    The Crazy One TkTech will become famous soon enough TkTech will become famous soon enough TkTech's Avatar
    Join Date
    Jun 2006
    Location
    Canada
    Age
    19
    Posts
    1,406
    Blog Entries
    1

    Re: Searching and Inserting: Arrays vs Linked Lists

    @outsid3r: NULL isn't guaranteed to be zero It just 'is' by convention. And there's nothing wrong with NULL, a concept of nothing always has to exist. It just poor programmers with poor checking that make it a faulty concept. Hell, MIPS has a $zero ($0) register just to express nothing.

    @Winged: Nice tut, +Rep. We going to see a series on Tries or something a bit more complex? I'd love to see your implementation of a Patricia trie. We use one in pedigree for the VFS cache, but with your logic and math skills it would probably be much more efficient

  9. #9
    Programming God outsid3r has a spectacular aura about outsid3r has a spectacular aura about outsid3r's Avatar
    Join Date
    Jul 2008
    Posts
    623

    Re: Searching and Inserting: Arrays vs Linked Lists

    I didn't said that there is anything wrong with NULL, i just don't like it at all, to express a pointer to nothing i just prefer 0, it's more simple.

    What bjarne has to say about null (which i agree at all):

    In C++, the definition of NULL is 0, so there is only an aesthetic difference. I prefer to avoid macros, so I use 0. Another problem with NULL is that people sometimes mistakenly believe that it is different from 0 and/or not an integer. In pre-standard code, NULL was/is sometimes defined to something unsuitable and therefore had/has to be avoided. That's less common these days.

    If you have to name the null pointer, call it nullptr; that's what it's going to be called in C++0x. Then, "nullptr" will be a keyword.

  10. #10
    The Crazy One TkTech will become famous soon enough TkTech will become famous soon enough TkTech's Avatar
    Join Date
    Jun 2006
    Location
    Canada
    Age
    19
    Posts
    1,406
    Blog Entries
    1

    Re: Searching and Inserting: Arrays vs Linked Lists

    I'm sorry, a bit aggravated this morning I'm used to the most portable code possible, and that includes common pre-standards.

    If bjarne sees this, I'd like to know what he has against macros (In PM of course, off topic here)

+ Reply to Thread
Page 1 of 2
1 2 LastLast

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

     

Similar Threads

  1. Help with inserting into a linked list (C)
    By Mt570 in forum C and C++
    Replies: 6
    Last Post: 09-30-2009, 11:02 PM
  2. Linked Lists
    By chili5 in forum Java Tutorials
    Replies: 2
    Last Post: 09-01-2009, 01:10 PM
  3. Linked Lists
    By WingedPanther in forum C Tutorials
    Replies: 5
    Last Post: 12-30-2008, 09:34 AM
  4. Linked Lists
    By Andrew.Anderson.2008 in forum C and C++
    Replies: 1
    Last Post: 06-27-2008, 08:39 AM
  5. Linked lists of strings (C)
    By Panserbjorn in forum C and C++
    Replies: 1
    Last Post: 10-26-2007, 01:58 PM