A Visual Guide to Longest Common Subsequence
Another type of problem that arises in computer science is finding the longest common subsequence. If you have the following string, BEADCA and ABCDA, what is the longest common subsequence? You can probably quickly figure out that the longest common subsequence of these two strings is BDA. It is possible to have multiple subsequences that are of equal length that would work. Let’s see how we would find the longest common subsequence in this problem.
We begin by constructing a matrix. One sequence goes on top of the matrix and the other goes on the left side. For every row/column match, we’ll enter a 1. If there is no match, we’ll enter a zero, unless the adjacent cell is already higher than zero, in which case we enter that value. For all other matches where the adjacent cell is already greater than one, 1 will be added to the new cell. Let’s see what that looks like in practice.
We’ll start by populating the first row and first column with zeros.
We then start with the first row, B. Row B is compared with the column A. Since there’s no match, a zero is entered for that spot.
The next comparison is between row B and column B. Since they match, we’ll add a value of 1, which is added from the cell (_,A).
We find out that there is no match when comparing row B to column C. We carry the 1 from cell (B,B).
Similarly, we carry the 1 from cell (B,C) to cell (B,D) and to cell (B,A).
We move to the second row. For row E, we compare it to the first column. Since they’re not the same, we enter a zero for that cell. We move to column B. Even though there is no match, we can carry the 1 from cell (B,B).
Since there are no other column values that match row E, we can carry the 1 across.
We move down to row A. Row A and column A are identical, so we enter a 1 in cell (A,A).
Moving through, the next values do not match: A does not match B. However, we can bring down 1 from above.
Row A does not match with either C or D, so we’ll just move the 1 from cell (A,B) to those cells.
The last cell does match. We’ll add 1 to the top left cell (E,D) and enter the new value of 2 into cell (A,A).
We move to the next row down. Row D has no match with the first column, so we’ll just bring the 1 down. It also has no matches with the next two columns, so we’ll move 1 from cell (D,A) to cells (D,B) and (D,C).
Row D does match column D, so we’ll add 1 to the cell that’s located in the top left corner, (A,C). Whenever there’s a match, you will always add 1 to the top left cell and insert it into the new cell. Those transitions will aid you in creating the longest common subsequence.
Row D does not match column A, so we’ll just move the largest value into its location.
Moving on to row C, the first cells receive a value of 1 since that is the largest value surrounding them. Column A does not match row C.
The next cell does match, so we’ll add 1 to the value located in the top-left cell and add it to the new cell.
The next two cells do not match, so we’ll just move the largest value into those cells. We’ll explain why we moved from above for cell (C,D) and from the left for cell (C,A) when we complete this table. For now, just know that we’re trying to get the longest path possible.
On the last row, A does match column A, so we’ll add 1 to the top-left cell and insert the value into the new cell.
For cell (A,B) we’ll move the largest value into it.
Cell (A,C) contains a larger value above than to the left of it, so we’ll move the larger value from above into it. As a rule of thumb, if the value to the left is the same as the value above, move the value from above. That doesn’t mean that you shouldn’t account for other paths coming from the left. It could be that an equal size subsequence comes from the left.
The next cell’s row does not match the column value, so we’ll move the largest value surrounding it into the new cell.
Finally, the last cell’s row and column values do match, so we’ll add 1 to the
top-left cell and enter the new value there.
We know that the largest common subsequence is going to contain 3 letters, so we’ll try to follow the path up, starting at the last transitional cell (the cell with the 1 added to it from the top-left cell) and see what those values are going to be.
We can quickly see that the longest common subsequence is going to be B,D,A.
If you liked what you read, check out my book, An Illustrative Introduction to Algorithms.