Jump to content

Finding Longest Palindrome - how fast can it be??

- - - - -

  • Please log in to reply
5 replies to this topic

#1
fayyazlodhi

fayyazlodhi

    Programming Expert

  • Members
  • PipPipPipPipPipPip
  • 403 posts
Problem: Given a string find the longest possible palindrome in it efficiently


Quite a few years back - nearly a year out of college during my first job, I had a chance to interview with Google for a software development role. Though it didn’t turn out positive along with a few technical / non technical lessons learnt, it gave me a good idea of their freakiness about algorithmic efficiency among other things. I feel there would be a lot for others to learn from and there was no NDA :) so i am free to write about it. Plus it might solve problems faced by many as in
http://forum.codecal...comparison.html

So I was asked this question, which basically means that if I have “abcdefhghhgh”, the answer should be “hghhgh” though “hgh” itself is a palindrome though shorter one. For those who aren’t sure, a palindrome is a string (of numbers, characters etc.) that reads the same backwards as forwards i.e. “123321”.

The first thought that came to mind was to start out from the middle (single element for odd, and two elements for even length string) and keep comparing those outwards. Or this could otherwise be done having two indexes 0 and max size, keep on comparing them with each other while incrementing the first and decrementing the second.

However, this is only covering the case that if the entire string itself is a palindrome. So I realized this would only be a palindrome checker function which will be required any way in the solution. So it was a good idea to separate this one to be used latter.


bool palindrome_checker(char *str, int len)

{

    int start = 0;

    int end = len - 1; // 0 based - length 10 means index 0 - 9


    for(;start < end; start++, end--) // less than (<) is fine since middle one is not compared for odd length

    {

        if(str[start] != str[end])

            return false;

    }


    return true;

}


So given a string, this should test if palindrome or not. Now I need to figure out a way to sort of check all possible sub strings in the original one. Well, since we need to find the longest, it might be a good idea to choose a length (max length of string), then keep going like this.

First check all substrings of length max (there will be only one – the entire string itself).

Then, decrement length by 1, and check all substrings of the original string of length -1.

Continue following the above step until palindrome found or until you reach the trivial length 1 which basically means no real palindrome exists (each single character number can be considered a palindrome always).

In order to achieve this, I wrote another function called generate_all_str.

It basically takes the original string, total length and current length (the length we are currently checking for a possible palindrome) as parameters and sequentially goes through all substrings of that length calling the palindrome_checker function above. It returns pointer to the index from which the palindrome starts if found or null otherwise.


// This function basically walks through all sub strings of the passed length (curr_len), calling palindrome_checker

// for each and returns the first match found OR null. S is the original string.

char * generate_all_str(char * s, int curr_len, int total_len)

{

    for(int i=0; i<total_len; i++)

        if(palindrome_checker(&s[i],curr_len) == true)

            return &s[i];

    return NULL;

}


Finally from my main function, I ran a loop passing the original string and max length (decreasing length by one in every iteration) to the generate_all_str function. If such a string is found, I copy that to another local one (since I would like to print that by cout which needs a terminator for the palindrome – not the entire string).


#define SIZE 100


int main()

{

    char or_str[] = "abcdef1234334321ads";

    int total = strlen(or_str);

    char * pal;

    char res[SIZE];


    for (int l=total; l>1; l--)

    {

        pal = generate_all_str(or_str, l, total);


        if(pal!=NULL) // string found

        {

            strncpy(res, pal, l);

            res[l] = '\0';

            cout << res << endl;

            break;

        }

    }


    return 0;

}


The above works though it is a brute force solution and not very efficient. What is it’s running time complexity? Consider the case of a string in which there exists a palindrome of only length 2 where as the original string might be 100 characters.

So there is one loop in main which runs from length max to > 1 so this is O(n)
For each iteration of main loop generate_all_str is called once, which also runs a whole loop O(n). For each call of this function, there is a call to palindrome checker function which takes O(n) time.

So in total this appears like a nXnXn solution i.e. O(n^3) which might not be very appreciated for such a trivial problem.

The good and the hard part is, can we improve upon this?

Well I was asked by the interviewer to improve this solution. At that time, I thought of one vaguely which was not clear enough in my mind. But the idea was there and latter confirmed through code.

Consider the following:

For a given string, no matter where and how long a palindrome is, it will always have a center and would extend outwards. So why not try using this property and see if it improves the running time.

Basically we would run a loop from the start of the string to the end. In each iteration we would assume that the current index is the center of the palindrome (even or odd case needs to be take into account). If we pass the base case i.e. at least length 2(even) or 3 (odd) palindrome is found, we would extend that to the left and right until it remains a palindrome, if it breaks at a certain point, we would just store it along with length and move ahead.

The logic seems to bring some improvement i.e. one external loop and one internal loop for each iteration of external one in the worst case. So that makes it O(n^2) algorithm. Though the extend_palindrome function is called twice (once for even and once for odd), because you would never know in a given string position which one of the two might be valid and longer. However, this is still 2 * O(n) for the internal loop. Constants are ignored hence n^2 holds.


#include<iostream>

#include<iomanip>

#include<vector>

#include<algorithm>

#include<sstream>

#include<cstring>

using namespace std;


int extend_palindrome(char *s, int left, int right, int t_len)

{

    int p_len = 0; // found palindrome's length


    // even case, treat s[right] as center of palindrome

    for (;left >= 0 && right < t_len; p_len++, left--, right++)

    {

        if (s[left] != s[right])

            break;

    }


    return p_len;

}


int main()

{

    char or_str[] =  "abcdef1234334321ads";//"abcdefhghhgh"; //"11221134543112211";//"112211";; 

    int total = strlen(or_str);

    int max_p_len = 0;

    char longest_p[100];


    // starting from index 1, since palindrome should at least be of length 2

    for(int cen=1; cen < total; cen++)

    {

        int even_len = extend_palindrome(or_str, cen-1, cen, total);

        int odd_len = extend_palindrome(or_str, cen-1, cen+1, total);

        int cp_len;

        int to_be_copied_len = 0;


        if (even_len < odd_len)

        {

            cp_len = odd_len;

            to_be_copied_len = cp_len * 2 + 1;

        }


        else

        {

            cp_len = even_len;

            to_be_copied_len = cp_len * 2;

        }


        if(cp_len > max_p_len)

        {

            max_p_len = cp_len;


            strncpy(longest_p, &or_str[cen-cp_len], to_be_copied_len);

            longest_p[to_be_copied_len] = '\0';

        }

    }


    cout << "Longest palindrome is : " << longest_p << endl;


    return 0;

}


Fortunately, for a programmer (but unfortunate for me for that interview). This can further be optimized up to O(n). In fact straight out I was asked to reduce it to n and I was like “come again…..”

There is a data structure called suffix trees (that I had never heard of at that time) but this problem made me learn it. Wait for the next tutorial to get there.

Edited by fayyazlodhi, 27 June 2011 - 06:35 AM.
fix typo: palindrome definition

Today is the first day of the rest of my life

#2
lethalwire

lethalwire

    while(false){ ... }

  • Members
  • PipPipPipPipPipPipPip
  • 748 posts
  • Programming Language:Java, PHP
  • Learning:Java, PHP
:thumbup1:This was a cool read.
Thanks.

Quote

For those who aren’t sure, a palindrome is a number[string] that reads the same backwards as forwards i.e. “123321”.


#3
ZekeDragon

ZekeDragon

    Writes binary right handed and hex left handed

  • Moderators
  • 2,103 posts
Here's my pathetic attempt at getting O(n) efficiency:
class LongestPalindrome

{

public:

    char const *start_loc;

    int size;

};


void set_if_largest(LongestPalindrome &to_modify, char const *to_seek, int low, int high)

{

    int sz = high - low + 1;

    if (sz > to_modify.size)

    {

        to_modify.start_loc = &(to_seek[low]);

        to_modify.size = sz;

    }

}


LongestPalindrome find_longest_palindrome(char const *loc)

{

    // The first three conditions are sizes 0, 1, and 2. These are special conditions.

    LongestPalindrome ret_val;

    ret_val.start_loc = loc;

    ret_val.size = 0;

    

    if (*loc != '\0')

    {

        ret_val.size = 1; // Must now at least be 1!

        

        if (loc[1] != '\0' && loc[0] == loc[1])

        {

            ret_val.size = 2;

        }

        

        std::list<int> compares_positions;

        

        for (int i = 2; loc[i] != '\0'; ++i)

        {

            if (loc[i - 2] == loc[i])

            {

                compares_positions.push_back(i - 2);

                set_if_largest(ret_val, loc, i - 2, i);

            }

            if (loc[i - 1] == loc[i])

            {

                compares_positions.push_back(i - 1);

                set_if_largest(ret_val, loc, i - 1, i);

            }

            

            std::list<int>::iterator checks = compares_positions.begin();

            while (checks != compares_positions.end())

            {

                if (*checks >= 0 && loc[*checks] == loc[i])

                {

                    set_if_largest(ret_val, loc, *checks, i);

                    --(*checks);

                    ++checks;

                }

                else

                {

                    std::list<int>::iterator copy = checks++;

                    compares_positions.erase(copy);

                }

            }

        }

    }

    

    return ret_val;

}
I cried myself to sleep last night and pulled large chunks of hair out of my scalp due to the endless torrent of red-faced frustration this project turned into. There are quite a few choice words I have, but they're not friendly to this forum, and I don't even think it's O(n) efficiency yet. :(
Wow I changed my sig!

#4
fayyazlodhi

fayyazlodhi

    Programming Expert

  • Members
  • PipPipPipPipPipPip
  • 403 posts

lethalwire said:

:thumbup1:This was a cool read.
Thanks.

Thank you too lethalwire for pointing that out. I have edited with a clarification since palindromes need necessarily not be numbers. Didn't get a +rep though :)
Today is the first day of the rest of my life

#5
fayyazlodhi

fayyazlodhi

    Programming Expert

  • Members
  • PipPipPipPipPipPip
  • 403 posts
Thank you Zeke Dragon. It is pleasing to know that you found it interesting enough to engage you that long. I will take a closer look at your solution to be able to comment upon it. But i know from prior knowledge that the only way to get O(n) is using suffix trees building which is a good deal of effort. I am planning on coding that as a follow up so stay tuned.
Today is the first day of the rest of my life

#6
ZekeDragon

ZekeDragon

    Writes binary right handed and hex left handed

  • Moderators
  • 2,103 posts

fayyazlodhi said:

But i know from prior knowledge that the only way to get O(n) is using suffix trees building which is a good deal of effort. I am planning on coding that as a follow up so stay tuned.
Yeah, it's not. It's still O(n^2) and it's pretty obvious why. I'll wait for your course on it. :)

I was too tired at the moment of posting, just wanted to create something and I guess that's all I could get was an algorithm that would, at least I believe, probably sport better than most O(n^2) algorithms out there... won't bet anything on it though.
Wow I changed my sig!




1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users