**Problem: Given two arrays i.e. one as the sorting order array and the second to be sorted based upon it, print the sorted array.**

I recently saw this problem posted on some blog mentioning it was asked in a programming interview at Google. It looked particularly interesting because sorting is a well known problem with numerous algorithms each of which suits in a particular situation. However, it is rare that you have to extend those or write a complete sorting routine yourself.

I saw many related sorting discussions some of which are

http://forum.codecal...05-sorting.html

http://forum.codecal...ues-arrays.html

http://forum.codecal...ues-arrays.html

So I thought about it for a while and brought this to a tutorial with a couple of possible solutions.

**Problem Explanation**

To clarify the problem statement, let’s say you are given sorting order string “abc” and using it you sort the string “ccbaba”. It should result in “aabbcc”. However, if the order string is changed to cba, the result should be “ccbbaa”. (Just ran those instances on my solution to be sure).

There would be some constraints to this problem as I expect for e.g. every element in the ‘to be sorted array’ should be present in the ‘order array’. However, it is possible that you have elements in ‘order array’ that do not occur in sorted array. Also the ‘to be sorted array’ can have duplication where as duplicates make little sense for the order array and should not be allowed.

**First thought of Solution**

So how would one go about solving it? The first intuition that comes to mind should be to use the normal standard way of sorting (using any technique such as selection sort). The only required modification would be to the comparison part i.e. I can’t no longer say that a < c. I rather would have to find whether c comes first in the order array or a.

**Selection Sort**

The reason I chose selection sort is that it looks most conceptually trivial and intuitive to me. The first time I wrote a sorting function in my life was indeed selection sort. I originally read bubble sort, tried solving myself and only discovered latter that I coded selection sort.

The concept is to go through the entire array finding current max (or min) by keeping a separate variable as max and compare it with each element. Upon completion of one iteration of this loop, we have found one max element. Now we copy this element in the first place, and go through the rest of the array finding the next largest element. Hence we would have two nested loops. Upon completion of one iteration of external loop, we found one element’s final place.

The worst case run time complexity of this algorithm is O(n)^2.

**Apply Selection Sort to the problem**

We run two nested loops. The outer loop tells our current position for e.g. if it has ran twice it would mean we have extracted and placed the first two elements of the final sorted array in first and second index. The inner loop finds our current minimum or maximum depending upon if we are sorting in ascending or descending order.

The part where algorithm changes according to current problem is the comparison i.e. when we would compare if one char is greater or less than other.

What we would do here is to iterate through the order array and get index of the current element. It is comparison of these indexes that tells which of the two is smaller or greater than the other. Code follows.

void swap(char *fir, char *sec) { char temp = *fir; *fir = *sec; *sec = temp; } int indxof(char c, char *order) { for(int a=0; order[a] != '\0'; a++) if(order[a] == c) return a; return -1; } void sort_by_order(char *order, char*to_sort) { int min_indx_to_sort; for(int i=0; to_sort[i] != '\0'; i++) { int min_indx_odr = 100000000; // any value greater than max array index for(int j=i; to_sort[j] != '\0'; j++) { int result = indxof(to_sort[j], order); if(result != -1 && min_indx_odr > result) { min_indx_odr = result; min_indx_to_sort = j; } } swap(&to_sort[i], &to_sort[min_indx_to_sort]); } } int main() { char order[] = "dfbcae"; char tosort[] = "abcdeeabc"; sort_by_order(order, tosort); cout << tosort << endl; //output -> dbbccaaee return 0; }

There is a separate function for swapping two elements in place since it uses pointers. Then there is another function indxof, which takes a character and returns the integer index of that character in the order array. If the element does not exist, then it returns -1.

Finally it is our function sort_by_order which runs the main two loops.

Below is the output of the program when ran with the strings used in main.

The two nested loops similar to selection sort account for O(n)^2 worst case run time. For each of those instances we call index of function which could go up to O(n) time.

Hence our final complexity of this algorithm turns out to be O(n)^3.

This isn’t very impressive so can we do better?

**Faster Order sorting in O(n)^2**

Sure, one way to think of this is, how to get rid of the one extra loop? The reason we had to add this loop was that we don’t know the position of any element from the “to be sorted array” in the “order array”. To avoid that, what if we altogether start looking from order array instead of to be sorted array.

For e.g. if the first element in order array is ‘f’, we would know that in the final sorted array f should be at 0th index as long as it exists in the to be sorted array.

Based upon that we see a simple two nested loop algorithm. External loop iterates on the order array, where as internal loop goes through the actual array, and if an element is found it is swapped with the current index.

However, there are a few points that need to be taken care of. First, the outer array won’t have repetitions. But the inner one could have. So we need a separate variable keeping track of current index in the sorted array. Also we need to account for the possibility that an element exists in the order array but does not exist in the array to be sorted.

Here is the new faster function. It internally calls the same swap API pasted above.

void sort_fast_by_order(char *order, char *to_sort) { /* A separate variable is needed to keep track of how many elements we are currently done sorting with. * This helps resolving two problems * 1. The case where an element is repeated in the original array * 2. Order array contains an element which does not exist in the 'array to be sorted' */ int c_sorted = 0; for(int i=0; order[i] != '\0';i++) // for each element in order array { // see if it exists in the 'to be sorted array' and exchange with ith element for(int j = c_sorted; to_sort[j]!='\0'; j++) { if(order[i] == to_sort[j]) { swap(&to_sort[c_sorted], &to_sort[j]); c_sorted++; // We do not break from loop here because the element in order array can be repeated in the // to_sort array. We take a separate variable repeated which decides which instance to swap. } } } }

Note that we didn’t use strlen() anywhere to avoid the order n execution and instead checked all loops for terminating null ‘\0’. The same would be used internally by strlen() in case you are not aware.

The running time of above algorithm is O(n)^2. Can this be further improved?

**Can Order sorting be faster i.e. O(nlgn)?**

While I was thinking of above algo, I also found another aspect to it. When we go linear in the order array, for each element we only need to know if it exists in the order array. Then we can swap it with the current index element. What if at first we sort the actual array using standard > operator of ascii using any nlogn algorithm. Then we iterate through the order array, and for each element of order array, we do a binary search in the sorted array. Since binary search runs in O(lg n) so our entire algorithm would run in O(nlgn) in the worst case. That was just the idea and I didn’t proceed ahead to implement that. But mainly there is not much to it other than using either Merge or Heap sort both of which run in O(n log n) in the worst case.

**Edited by fayyazlodhi, 17 July 2011 - 12:06 AM.**

fixing links