Jump to content

LCS Problem

- - - - -

  • Please log in to reply
26 replies to this topic

#1
Anemone

Anemone

    Newbie

  • Members
  • PipPip
  • 15 posts
Hey!

Im new to Java and have a problem
I need to calculate longest common substring between two
strings and show it like this:

Posted Image

my code for get the LCS
from two strings is this:

public class FindLCS {


public static String lcs(String str1, String str2)

{

int i, j;

int m = str1.length();

int n = str2.length();


String[][] b = new String [m][n];


for (int s = 1; s<m; s++)

b[s][0] = "";

for (int z = 1; z<n; z++)

b[0][z] = "";


for (i = 1; i<m; i++)

for(j = 1; j<n; j++)

{

if(str1.charAt(i-1) == str1.charAt(j-1))

b[i][j] = b[i-1][j-1]+str1.charAt(+1);

else if(b[i-1][j].length() >= b[i][j-1].length())

b[i][j]= b [i-1][j];

else

  b[i][j] = b[i][j-1];

}

return b [m-1][n-1]

}

}



and my view class
so far

public class LCSView extends JFrame

{

public void Frame()

{

JFrame frame = new JFrame();


setSize(500,200);

}

public void modString(String str1, String str2)

{

repaint();

}

public void paintComponent(Graphics g)

{

g.setColor(Color.BLACK);


g.drawString(str1, 50, 50);

g.drawString(str2, 50, 60);


int w = g.getFontMetrics().charWidth('a');


}



the method modstrings changes these two strings so
that they repaints with the common subsequences they have.

I want to create a line between all the subsequence
in in the same color as viewing a sub from 2 strings
so all lcs between two strings should be painted
not sure how to do it
so would be very helpful if someone could help out here

main should probably look like this:

public class main

{

public static void main (String[] args)

{

String strTest1 = JOptionPane.showInputDialog("Type the first String");

String strTest2 = JOptionPane.showInputDialog("Type the second String");


Frame progam = new Frame();

program.show();

}

}



The main methods should repeatedly ask a user to provide two strings
and use the view class to display them
first time posting here so hope I have provided with enough
for you to understand

#2
wim DC

wim DC

    Writes binary right handed and hex left handed

  • Members
  • PipPipPipPipPipPipPipPipPip
  • 2,084 posts
  • Programming Language:Java, JavaScript, PL/SQL
  • Learning:Java
Your FindLCS class was not working with me. I found it a bit too complicated for my liking to start debugging it so i erased it and started from scratch.
I ended up with:

public class FindLCS {


    public static String lcs(String str1, String str2) {

        String lcs ="";


        for (int i = 0; i < str1.length(); i++) {

            for (int j = str1.length()-1 ; j>=i; j--) {

                String subString = str1.substring(i,j);

                if(str2.contains(subString) && lcs.length()< subString.length()){

                    lcs = subString;

                }

            }

        }


        return lcs;

    }

}


And i also changed the view class a little bit.
... a lot.

public class LCSView extends JFrame {

    private String str1, str2, lcs;


    LCSView(String strTest1, String strTest2) {

        super("LCS");

        str1 = strTest1;

        str2 = strTest2;

        lcs = FindLCS.lcs(str1, str2);

        setSize(500, 200);

        setVisible(true);

    }   


    public void paint(Graphics g) {

        String[] strings = prepaint();

        int x = 50, x1=0, x2=0;


        g.setColor(Color.BLACK);

        g.drawString(strings[0], x, 50);

        x += getStringLength(g, strings[0]);


        g.setColor(Color.GREEN);

        g.drawString(strings[1], x, 50);

        x += getStringLength(g, strings[1]);


        g.setColor(Color.BLACK);

        g.drawString(strings[2], x, 50);


        x = 100;

        g.drawString(strings[3], x, 80);

        x += getStringLength(g, strings[3]);


        g.setColor(Color.GREEN);

        g.drawString(strings[4], x, 80);

        x += getStringLength(g, strings[4]);


        g.setColor(Color.BLACK);

        g.drawString(strings[5], x, 80);


        x1 = 50 + getStringLength(g, strings[0])+2;

        x2 = 100 + getStringLength(g, strings[3])+2;

        g.setColor(Color.GREEN);

        for(int i=0 ; i<lcs.length() ; i++){

            g.drawLine(x1, 50, x2, 71);

            x1 += getStringLength(g, lcs.charAt(i)+"");

            x2 += getStringLength(g, lcs.charAt(i)+"");

        }

        

    }


    private int getStringLength(Graphics g, String str){

        return g.getFontMetrics().charsWidth(str.toCharArray(), 0, str.length());

    }


    private String[] prepaint() {

        String[] strings = new String[6];

        int index = str1.indexOf(lcs);

        strings[0] = str1.substring(0, index-1);

        strings[1] = str1.substring(index, index + lcs.length());

        strings[2] = str1.substring(index + lcs.length());


        index = str2.indexOf(lcs);

        strings[3] = str2.substring(0, index-1);

        strings[4] = str2.substring(index, index + lcs.length());

        strings[5] = str2.substring(index + lcs.length());


        return strings;

    }

}


You can propably put some stuff from the paint method in some loops.


As main i used:

LCSView view = new LCSView("aaaaaaaabbb[B]cccc[/B]dd", "hhihihihi[B]cccc[/B]ccckkkk");


I didn't do the thing with asking the user every now and then.
View will just need a method that takes the 2 strings, changes the LCS in view class and repaint(); itself.

#3
Anemone

Anemone

    Newbie

  • Members
  • PipPip
  • 15 posts
Hey thank you for the quick reply

the problem is that with the find class
I should get a matrix that if you have
two strings: "abcs" and "sac"
it should create a matrix that should look like this

"" "" "" ""
"" "" "a" "a"
"" "" "a" "a"
"" "" "a" "ac"
"" "s" "s" "ac"

so the first row and column only consist of empty strings
and the solution to this is the string in the lower right corner of the matrix

#4
wim DC

wim DC

    Writes binary right handed and hex left handed

  • Members
  • PipPipPipPipPipPipPipPipPip
  • 2,084 posts
  • Programming Language:Java, JavaScript, PL/SQL
  • Learning:Java
I don't understand in your code it creates an array the size of the 2 words:
int m = str1.length();
int n = str2.length();

String[][] b = new String [m][n];

And with abcs & sac you have a [5][4] array.

#5
Anemone

Anemone

    Newbie

  • Members
  • PipPip
  • 15 posts
the code should match all the
substrings and create a matrix
thats what i want with that code

the reason for this is that the view
program should print a line between all the
substrings.
not just the longest.
I tried your solution works great! but I need
all the substrings
if you have a string like "ACGTTACGT"
and "ACAGTGTTA"

the result should print a line between like this for 2 strings
Posted Image
so you have a line with between all the substrings that can be found.
so I tried to run it

}

return matrix [m-1][n-1];

}

public static void main(String [] args) {

String str1 = "abcs";

String str2 = "sac";

System.out.print(lcs(str1,str2));

}

}


result: nullbbb

not quiet what I want
do you see what the problem is?

Edited by Anemone, 28 November 2010 - 04:52 AM.


#6
wim DC

wim DC

    Writes binary right handed and hex left handed

  • Members
  • PipPipPipPipPipPipPipPipPip
  • 2,084 posts
  • Programming Language:Java, JavaScript, PL/SQL
  • Learning:Java
Okay, still trying to understand the matrix..
I made this in excel
[ATTACH]3461[/ATTACH]
As you see, every diagonal (top left, down right) that has more than one '1' is an equal substring (single letter substrings ignored)

Now my question is. You see to the right that the subString "AC" from the word "acagtgtta" occurs 2 times in "acgttacgt", (yellow and green)
Will you in the end, end up with multiple lines starting from 1 substring, pointing to multiple occurences in the 2nd String? Like so:
[ATTACH]3462[/ATTACH]

Attached Files



#7
Anemone

Anemone

    Newbie

  • Members
  • PipPip
  • 15 posts
Sorry the problem is that I need to have a matrix displayed in that way
thank you for trying :)!
no the substring should only be used once when its found
as the print tell its only lined between the first AC and not the other

#8
wim DC

wim DC

    Writes binary right handed and hex left handed

  • Members
  • PipPipPipPipPipPipPipPipPip
  • 2,084 posts
  • Programming Language:Java, JavaScript, PL/SQL
  • Learning:Java
I currently made this matrix, is this the one you need?
 |  A  |  C  |  G  |  T  |  T  |  A  |  C  |  G  |  T  |
A|   A |     |     |     |     |   A |     |     |     | 
C|     |  AC |     |     |     |     |  AC |     |     | 
A|   A |     |     |     |     |   A |     |     |     | 
G|     |     |   G |     |     |     |     |   G |     | 
T|     |     |     |  GT |   T |     |     |     |  GT | 
G|     |     |   G |     |     |     |     |   G |     | 
T|     |     |     |  GT |   T |     |     |     |  GT | 
T|     |     |     |   T | GTT |     |     |     |   T | 
A|   A |     |     |     |     | GTTA |     |     |     | 


#9
Anemone

Anemone

    Newbie

  • Members
  • PipPip
  • 15 posts
this should probably
be the case to calculate it

The first row and column consist only of empty strings (one of the two strings for which we compute the lcs is empty; then the lcs is also empty).
Consider an arbitrary i>0 and j>0. Recall that we here consider only the first i characters of str1 and the first j characters of str2.

If the i:th character of str1 and the j:th character of str2 are equal, then the matrix element in this position must end with this common character. And before that character should come the solution to the lcs problem obtained when the last character is erased in both strings; but this solution is the matrix element in position (i-1,j-1). (So, in position (3,3) the common letter 'c' is added at the end of the lcs "a" from position (2,2)).
If they are not equal, the lcs in this position is the longer of the two matrix elements immediately above and immediately to the left. (Here is an ambiguity if these two elements have the same length).

The solution to the problem should end up in the lower right corner of the matrix. and fill in the elements of the matrix according to this

So if you have "abcs" and "sac"

it should be :


"" "" "" ""
"" "" "a" "a"
"" "" "a" "a"
"" "" "a" "ac"
"" "s" "s" "ac"

hope this helps

#10
wim DC

wim DC

    Writes binary right handed and hex left handed

  • Members
  • PipPipPipPipPipPipPipPipPip
  • 2,084 posts
  • Programming Language:Java, JavaScript, PL/SQL
  • Learning:Java
Okay, i understand the logic to create such a matrix, but i don't understand how that gives a solution for the longest common string..

#11
Anemone

Anemone

    Newbie

  • Members
  • PipPip
  • 15 posts
this class that makes the matrix should be used
to the view class I think and through that its
easier to get that view as i displayed where every
substring is green marked
The longest common always ends up in the lower right corner

so this matrix (I think) should make it simplier to impliment the view class.

#12
wim DC

wim DC

    Writes binary right handed and hex left handed

  • Members
  • PipPipPipPipPipPipPipPipPip
  • 2,084 posts
  • Programming Language:Java, JavaScript, PL/SQL
  • Learning:Java
"ac" is no substring of "abcs"... it's no longest common substring.
This is what i end up with acgttacgt and acagtgtta... without those 2 empty rows /columns

 |    A   |    C   |    G   |    T   |    T   |    A   |    C   |    G   |    T   |

A|       A|       A|       A|       A|       A|       A|       A|       A|       A|

C|       A|      AC|      AC|      AC|      AC|      AC|      AC|      AC|      AC|

A|       A|      AC|      AC|      AC|      AC|     ACA|     ACA|     ACA|     ACA|

G|       A|      AC|     ACG|     ACG|     ACG|     ACG|     ACG|    ACAG|    ACAG|

T|       A|      AC|     ACG|    ACGT|    ACGT|    ACGT|    ACGT|    ACGT|   ACAGT|

G|       A|      AC|     ACG|    ACGT|    ACGT|    ACGT|    ACGT|   ACGTG|   ACGTG|

T|       A|      AC|     ACG|    ACGT|   ACGTT|   ACGTT|   ACGTT|   ACGTT|  ACGTGT|

T|       A|      AC|     ACG|    ACGT|   ACGTT|   ACGTT|   ACGTT|   ACGTT|  ACGTTT|

A|       A|      AC|     ACG|    ACGT|   ACGTT|  ACGTTA|  ACGTTA|  ACGTTA|  ACGTTA|

Now, how is the view class supposed to know that GTTA is a substring of both?




1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users