Computer Science‎ > ‎

Algorithms: Dynamic Programming - Longest Common Sub-sequence with C Program Source Code




To go through the C program / source-code, scroll down to the end of this page

Longest Common Subsequence


A subsequence of a given sequence is the given sequence with just some elements left out (order should be from left-to-right, not necessarily consecutive).. A common sequence of two sequences X and Y, is a subsequence of both X and Y.A longest common subsequence is the one with maximum length. For example, if X = {A,B,C,B,D,A,B} and Y = { B,D,C,A,B,A} then the longest common subsequence is of length 4 and they are {B,C,B,A} and {B,D,A,B}.

Finding the longest common subsequence has applications in areas like biology. The longest subsequence (LCS) problem has an optimal substructure property. Thus, the dynamic programming method can be used to solve this problem.

Theorem used - Let X =< x1, x2, . . . , xm > and Y =< y1, y2, . . . , yn > be sequences, and let Z =< z1, z2, . . . , zk > be any LCS of X and Y .

1. If xm = yn, then zk = xm = yn and Zk−1 is an LCS of Xm−1 and Yn−1.
2. If xm = yn, then zk = xm implies that Z is an LCS of Xm−1 and Y.
3. If xm = yn, then zk = yn implies that Z is an LCS of X and Yn−1.

Complete Tutorial with Examples:



C Program Source Code for the Longest Common Subsequence problem

#include<stdio.h>
#include<string.h>
#define maxn 100100
int max(int a,int b)
{
       
return a>b?a:b;
}
int LongestCommonSubsequence(char S[],char T[])
{
       
int Slength = strlen(S);
       
int Tlength = strlen(T);
       
/* Starting the index from 1 for our convinience (avoids handling special cases for negative indices) */
       
int iter,jter;
       
for(iter=Slength;iter>=1;iter--)
       
{
                S
[iter] = S[iter-1];
       
}
       
for(iter=Tlength;iter>=1;iter--)
       
{
                T
[iter] = T[iter-1];
       
}
       
int common[Slength+1][Tlength+1];
       
/* common[i][j] represents length of the longest common sequence in S[1..i], T[1..j] */
       
/* Recurrence:  common[i][j] = common[i-1][j-1] + 1 if S[i]==T[j]
                                     = max(common[i-1][j],common[i][j-1]) otherwise
        */
     
       
/*common[0][i]=0, for all i because there are no characters from string S*/
       
for(iter=0;iter<=Tlength;iter++)
       
{
                common
[0][iter]=0;
       
}
       
/*common[i][0]=0, for all i because there are no characters from string T*/
       
for(iter=0;iter<=Slength;iter++)
       
{
                common
[iter][0]=0;
       
}
       
for(iter=1;iter<=Slength;iter++)
       
{
               
for(jter=1;jter<=Tlength;jter++)
               
{
                       
if(S[iter] == T[jter] )
                       
{
                                common
[iter][jter] = common[iter-1][jter-1] + 1;
                       
}
                       
else
                       
{
                                common
[iter][jter] = max(common[iter][jter-1],common[iter-1][jter]);
                       
}

               
}
       
}
       
return common[Slength][Tlength];

}
int main()
{
       
char S[maxn],T[maxn];/* S,T are two strings for which we have to find the longest common sub sequence. */
        scanf
("%s%s",S,T);
        printf
("%d\n",LongestCommonSubsequence(S,T));

}

Rough notes about the Algorithm implemented in the code above:

S,T are two strings for which we have to find the longest common sub sequence. Input the two sequences. Now print the longest common subsequence using LongestCommonSubsequence function.

LongestCommonSubsequence function : This function takes the two sequences (S, T) as arguments and returns the longest common subsequence found.

Store the length of both the subsequences. Slength = strlen(S), Tlength = strlen(T). We will Start with the index from 1 for our convenience (avoids handling special cases   for negative indices).
Declare common[Slength][Tlength]. Where, common[i][j] represents length of the longest common sequence in S[1..i], T[1..j].
If there are no characters from string S, common[0][i]=0 for all i or if there are no characters from string T, common[i][0]=0 for all i.
Recurrence: for i=1 to Slength
                for j=1 to Tlength
                   common[i][j] = common[i-1][j-1] + 1, if S[i]=T[j]. Else, common[i][j] = max(common[i-1][j],common[i][j-1]). Where max is a function which takes the two
variables as arguments and returns the maximum of them.
Return common[Slength][Tlength].


Related Tutorials (common examples of Dynamic Programming):

 Integer Knapsack problem

 An elementary problem, often used to introduce the concept of dynamic programming.
 Matrix Chain Multiplication

 Given a long chain of matrices of various sizes, how do you parenthesize them for the purpose of multiplication - how do you chose which ones to start multiplying first?

 Longest Common Subsequence 

 Given two strings, find the longest common sub sequence between them.


Some Important Data Structures and Algorithms, at a glance:

Arrays : Popular Sorting and Searching Algorithms

 

  

Bubble Sort  

Insertion Sort 

Selection Sort Shell Sort

Merge Sort  

Quick Sort 

 
Heap Sort
 
Binary Search Algorithm

Basic Data Structures  and Operations on them


  

Stacks 

Queues  

 
 Single Linked List 

Double Linked List

Circular Linked List 















Basic Data Structures and Algorithms



Sorting- at a glance

 Bubble Sort One of the most elementary sorting algorithms to implement - and also very inefficient. Runs in quadratic time. A good starting point to understand sorting in general, before moving on to more advanced techniques and algorithms. A general idea of how the algorithm works and a the code for a C program.

Insertion Sort - Another quadratic time sorting algorithm - an example of dynamic programming. An explanation and step through of how the algorithm works, as well as the source code for a C program which performs insertion sort.

Selection Sort - Another quadratic time sorting algorithm - an example of a greedy algorithm. An explanation and step through of how the algorithm works, as well as the source code for a C program which performs selection sort.

Shell Sort- An inefficient but interesting algorithm, the complexity of which is not exactly known.

Merge Sort An example of a Divide and Conquer algorithm. Works in O(n log n) time. The memory complexity for this is a bit of a disadvantage.

Quick Sort In the average case, this works in O(n log n) time. No additional memory overhead - so this is better than merge sort in this regard. A partition element is selected, the array is restructured such that all elements greater or less than the partition are on opposite sides of the partition. These two parts of the array are then sorted recursively.

Heap Sort- Efficient sorting algorithm which runs in O(n log n) time. Uses the Heap data structure.

Binary Search Algorithm- Commonly used algorithm used to find the position of an element in a sorted array. Runs in O(log n) time.

Basic Data Structures and Algorithms


 Stacks Last In First Out data structures ( LIFO ). Like a stack of cards from which you pick up the one on the top ( which is the last one to be placed on top of the stack ). Documentation of the various operations and the stages a stack passes through when elements are inserted or deleted. C program to help you get an idea of how a stack is implemented in code.

Queues First in First Out data structure (FIFO). Like people waiting to buy tickets in a queue - the first one to stand in the queue, gets the ticket first and gets to leave the queue first. Documentation of the various operations and the stages a queue passes through as elements are inserted or deleted. C Program source code to help you get an idea of how a queue is implemented in code.

Single Linked List A self referential data structure. A list of elements, with a head and a tail; each element points to another of its own kind.

Double Linked List- A self referential data structure. A list of elements, with a head and a tail; each element points to another of its own kind in front of it, as well as another of its own kind, which happens to be behind it in the sequence.

Circular Linked List Linked list with no head and tail - elements point to each other in a circular fashion.

 Binary Search Trees A basic form of tree data structures. Inserting and deleting elements in them. Different kind of binary tree traversal algorithms.

 Heaps A tree like data structure where every element is lesser (or greater) than the one above it. Heap formation, sorting using heaps in O(n log n) time.

 Height Balanced Trees - Ensuring that trees remain balanced to optimize complexity of operations which are performed on them.

Graphs

 Depth First Search - Traversing through a graph using Depth First Search in which unvisited neighbors of the current vertex are pushed into a stack and visited in that order.

Breadth First Search - Traversing through a graph using Breadth First Search in which unvisited neighbors of the current vertex are pushed into a queue and then visited in that order.

Minimum Spanning Trees: Kruskal Algorithm- Finding the Minimum Spanning Tree using the Kruskal Algorithm which is a greedy technique. Introducing the concept of Union Find.

Minumum Spanning Trees: Prim's Algorithm- Finding the Minimum Spanning Tree using the Prim's Algorithm.

Dijkstra Algorithm for Shortest Paths- Popular algorithm for finding shortest paths : Dijkstra Algorithm.

Floyd Warshall Algorithm for Shortest Paths- All the all shortest path algorithm: Floyd Warshall Algorithm

Bellman Ford Algorithm - Another common shortest path algorithm : Bellman Ford Algorithm.

Dynamic Programming A technique used to solve optimization problems, based on identifying and solving sub-parts of a problem first.

Integer Knapsack problemAn elementary problem, often used to introduce the concept of dynamic programming.

Matrix Chain Multiplication Given a long chain of matrices of various sizes, how do you parenthesize them for the purpose of multiplication - how do you chose which ones to start multiplying first?

Longest Common Subsequence Given two strings, find the longest common sub sequence between them.

 Elementary cases : Fractional Knapsack Problem, Task Scheduling - Elementary problems in Greedy algorithms - Fractional Knapsack, Task Scheduling. Along with C Program source code.

Data Compression using Huffman TreesCompression using Huffman Trees. A greedy technique for encoding information.